A Comprehensive Guide to AWS CLI Redshift | 6 Easy Steps

• March 22nd, 2022

AWS CLI Redshift FI

Amazon Redshift is a Data Warehouse solution offered by Amazon. Redshift runs on the AWS infrastructure, offering high performance to its users. It’s a columnar database solution, which makes it good for aggregating huge volumes of data and parallel processing. Thus, Redshift is a suitable data warehouse platform for use even by large companies that handle terabytes of data. 

When using Amazon Redshift, you will have to manage your Redshift clusters. Redshift provides an interface where you can perform administrative tasks by running commands. This interface is known as the AWS CLI (command line interface). It is good to familiarize yourself with this interface so as to find it easy to use Redshift. In this article, we will be discussing the AWS CLI Redshift in detail. 

In this article, you will learn about AWS CLI Redshift and working on clusters, and performing actions like creating, modifying, and deleting clusters.

Table Of Contents

Prerequisites

This is what you need for this article:

  • An AWS Account

What is AWS Redshift?

Amazon Redshift is a petabyte-scale cloud data warehouse solution that makes the larger part of the AWS cloud platform. It provides its users with a platform where they can store their data and analyze it to extract business insights. 

Traditionally, individuals and businesses had to make predictions and other forecasts manually. Redshift does the largest part of the work of data analysis to give you time to focus on something else. It also helps you to analyze your data using the latest predictive analytics. You can then make smart business decisions that can drive the growth of your enterprise. To learn more about Amazon Redshift, visit its official documentation

Key Features of Amazon Redshift

Here are some key features of Amazon Redshift:

  • Column-oriented Databases: When it comes to accessing massive volumes of data, a column-oriented database like Redshift is built for greater speed. OLAP operations are the focus of Redshift. SELECT operations are optimized.
  • Security: For data in transit, Amazon Redshift uses SSL encryption, and for data at rest, it uses hardware-accelerated AES-256 encryption. All data saved to disc, as well as any backup files, is encrypted. You won’t have to bother about key management because Amazon will do it.
  • Cost-effective: The most cost-effective cloud data warehousing option is Amazon Redshift. The estimated cost is one-tenth that of standard on-premise warehousing. There are no hidden fees; consumers simply pay for the services they utilize.
  • Scalable: Redshift is a petabyte-scale data warehouse system from Amazon. Amazon Redshift is easy to use and scales to meet your needs. You may quickly modify the number or type of nodes in your data warehouse with a few clicks or a simple API call, and scale up or down as needed.
  • Fault Tolerance: Redshift constantly checks the cluster’s health, and if a hard drive fails, it will immediately re-replicate data from the failed drives and replace nodes as needed for fault tolerance.

Simplify Redshift’s ETL & Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline, helps load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ Data Sources (including 40+ Free Sources). It is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination like Amazon Redshift

Hevo loads the data onto the desired Data Warehouse/destination in real-time and enriches the data and transforms it into an analysis-ready form without having to write a single line of code. Its completely automated pipeline, fault-tolerant, and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

GET STARTED WITH HEVO FOR FREE

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled securely and consistently with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.

Simplify your Data Analysis with Hevo today! SIGN UP HERE FOR A 14-DAY FREE TRIAL!

Working with AWS CLI Redshift

In this section, we will be discussing how to use AWS CLI Redshift to perform most basic Redshift cluster administrative tasks. 

Creating a Cluster in AWS CLI RedShift

You can use the AWS CLI Redshift to create a new cluster. The create-cluster command can help you achieve this. This command has both optional and required parameters. It is good to know all the optional and required parameters before configuring your cluster. Let’s describe the parameters that we will use in this example:

  • cluster-identifier: A unique identifier or the name of the cluster. You will be using this identifier to access, update, or delete the cluster. The following constraints must be adhered to when creating the cluster identifier:
    • Must have 1 – 63 alphanumeric characters or hyphens.
    • Alphabetic characters must be written in lowercase.
    • The First character must be a letter.
    • Cannot have two consecutive hyphens or end with a hyphen.
    • Must be unique for the clusters within the Amazon Web Services account.
  • master-username: The admin user of the account where the cluster is to be created. 
  • master-user-password: The password of the admin user for the account where the cluster is to be created. 
  • node-type: The size of the node that you need to use. 
  • cluster-type: The type of cluster to be created. It takes two possible values, multi-node and single-node. By default, it takes a value of multi-node

The following command demonstrates how to create a cluster named mysamplecluster using the AWS CLI Redshift:

aws redshift create-cluster --cluster-identifier mysamplecluster --master-username masteruser --master-user-password secret1 --node-type ds2.xlarge --cluster-type single-node

The process of creating the cluster will take some minutes, but you can check its progress by running the following command:

aws redshift describe-clusters --cluster-identifier mysamplecluster

If the cluster status is creating, it means that it is not yet created. However, when it changes to available, it means that the cluster has been created.

Authorizing Inbound Traffic to The Cluster in AWS CLI Redshift

To connect to the cluster, you must grant inbound access to the client explicitly. The client can be an external computer or an Amazon EC2 instance. 

When we were creating the cluster, we did not assign it to a security group, hence, it was associated with the default security group. This security group has no rules for authorizing inbound traffic to the cluster. To be able to access the new cluster, we should add rules for inbound traffic to it. These rules are known as ingress rules

The following command can help you to allow network ingress to the cluster:

aws redshift authorize-cluster-security-group-ingress --cluster-security-group-name default --cidrip 192.0.2.0/24

Describing Cluster Security Groups in AWS CLI Redshift

You may need to know more about the available Redshift security groups. This is possible using the describe-cluster-security-groups AWS CLI Redshift command. If you specify a security group, it will return information about that security group. 

If you specify tag keys and values, the command will return all security groups that meet that combination. Otherwise, it returns information about all security groups. 

The following command returns information about all cluster security groups in the account:

aws redshift describe-cluster-security-groups

By default, the result will be returned in a JSON format as shown below:

Describing Clusters in AWS CLI Redshift

By describing your clusters, you can get the general cluster properties. This is possible using the describe-clusters command. The command can be used as follows:

aws redshift describe-clusters

The above command will return a description of all clusters in your account. 

To change the output format from JSON to text, you can use the –output text option as shown below:

aws redshift describe-clusters --output text

The command will return the output in a text format:

AWS CLI Redshift: clusters
Image Source

Modifying a Cluster in AWS CLI Redshift

It is possible to change the cluster settings via the AWS CLI Redshift using the modify-cluster command. You can change the type as well as the number of nodes to scale either upwards or downwards. Other parameters that you can modify include the admin user password, and adding a security or parameter group. 

For example, the command given below shows how to associate a cluster with a cluster security group:

aws redshift modify-cluster --cluster-identifier mysamplecluster --cluster-security-groups mysamplesecuritygroup

The cluster named mysamplecluster is now a member of the mysamplesecuritygroup

The following command demonstrates how to modify the master password for a cluster:

aws redshift modify-cluster --cluster-identifier mysamplecluster --master-user-password 123abc

Deleting a Cluster in AWS CLI Redshift

To delete a cluster without creating its final snapshot via the AWS CLI Redshift, you can use the delete-cluster command. Note that this operation cannot be reverted. 

If you need to shut down the cluster and retain it to be used in the future, set the SkipFinalClusterSnapshot parameter to false and specify a name for the FinalClusterSnapshotIdentifier. You can then restore this snapshot later to continue using the cluster. 

The following command demonstrates how to delete a cluster without creating a final snapshot:

aws redshift delete-cluster --cluster-identifier mysamplecluster --skip-final-cluster-snapshot

The following command shows how to delete a cluster while specifying the final cluster snapshot:

aws redshift delete-cluster --cluster-identifier mysamplecluster --final-cluster-snapshot-identifier finalsnapshot

That is how to work with the AWS CLI Redshift. 

Conclusion

This is what you’ve learned in this article. Redshift is a cloud data warehouse solution developed by Amazon. It runs on AWS infrastructure. When using Redshift, you will need to perform administrative tasks. You can do this by running commands on AWS CLI Redshift. You can use the AWS CLI Redshift to perform tasks such as creating, modifying, and deleting Redshift clusters.

Some of the cluster parameters that can be modified via the CLI include the type and number of nodes, admin user password, security group, and more. The delete cluster operation cannot be reverted. To shut down a cluster and retain it to be used in the future, you should set the SkipFinalClusterSnapshot parameter to false and give a name to the FinalClusterSnapshotIdentifier.

However, as a Developer, extracting complex data from a diverse set of data sources like Databases, CRMs, Project management Tools, Streaming Services, Marketing Platforms to your MariaDB or MongoDB Database can seem to be quite challenging. If you are from non-technical background or are new in the game of data warehouse and analytics, Hevo Data can help!

Visit our Website to Explore Hevo

Hevo Data will automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. This platform allows you to transfer data from 100+ multiple sources to Cloud-based Data Warehouses like Snowflake, Google BigQuery, Amazon Redshift, etc. It will provide you with a hassle-free experience and make your work life much easier.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

No-Code Data Pipeline for Redshift