A Comprehensive Amazon Redshift Tutorial 101

• October 12th, 2020

REDSHIFT TUTORIAL - Featured Image

Redshift is Amazon’s Data Warehouse solution. It provides its users with a Cloud data storage option. Redshift is very scalable, hence, you can use it to store huge volumes of data running up to petabytes in size. Redshift also offers adequate security to your data. 

If you want to learn more about Redshift but you don’t know where to start, you’ve come to the right place. In this Redshift tutorial, I will be taking you through some important concepts in Redshift to help you get started with it. 

Table of Contents

Prerequisites

This is what you need for this article:

Part 1: What is Amazon Redshift?

Amazon Redshift is a fully managed,  petabyte-scale, data warehouse service in the cloud. It allows you to begin with a few gigabytes of data and then scale to a petabyte or even more with time. Redshift stores data in clusters that can be accessed in parallel. This is why Redshift data can be accessed quickly and with much ease. Each node can be accessed independently by users and applications. 

You can use Redshift with a wide variety of data sources and data analytics tools and it can be integrated with many existing SQL-based clients. It has a good architecture that makes it easy to integrate the platform with many business intelligence tools. 

Each Redshift data warehouse is fully managed, meaning that administrative tasks like creating backups, security, and configuration are completely automated. 

Redshift was designed for big data, hence, it can scale easily due to its modular design. Its multi-layered structure makes it easy to process multiple queries simultaneously. 

The Redshift clusters can further be divided into slices, which can provide more granular insights into data sets.

Part 2: Redshift Account Setup

Before creating a Redshift cluster, you must do the following:

  • Sign up for AWS. 
  • Determine the firewall rules. 

If you already have an AWS account, well and good. If you don’t have an AWS account, click here and sign up for one. 

You will be asked to provide your personal details and billing information. You will also have to verify your account by receiving a phone call and entering a verification call on the phone keypad. 

When launching an Amazon Redshift cluster, you will specify a port number. You should also create an inbound ingress rule in a security group to allow access through that port. 

If you’re using your computer behind a firewall, you must know an open port that you can use. The open port will allow you to connect to the cluster from an SQL client tool and execute queries. 

Hevo, A Simpler Alternative to Integrate your Data for Analysis

Hevo offers a faster way to move data from databases or SaaS applications into your data warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.

GET STARTED WITH HEVO FOR FREE

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support call.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.

Simplify your data analysis with Hevo today!

SIGN UP HERE FOR A 14-DAY FREE TRIAL!

Part 3: Create an IAM Role

To perform an operation that will access data in any other AWS resource, your cluster will need permission to access the resource and the data on that resource. 

A good example is when you need to execute a COPY command to load data from Amazon S3. You can create those permissions using AWS Identity and Access Management (IAM). You can use an IAM role that is attached to your cluster or AWS access key for an IAM user with the necessary permissions. 

The following steps can help you create an IAM role for AWS Redshift:

Step 1: Sign in to the AWS management console and launch the IAM console here.

Step 2: Select “Roles” from the navigation pane. 

Step 3: Select “Create role”. 

Create role

Step 4: Select “Redshift” in the “AWS service” group. 

Step 5: Under “Select your use-case”, select “Redshift-Customizable” and then choose “Next: Permissions”. 

Step 6: On the “Attach permission policies” page, select “AmazonS3ReadOnlyAccess” and then choose “Next: Tags”. 

 Step 7: The “Add: Tags” page will be opened. Add tags if you wish, then choose “Next: Review”. 

Step 8: Enter a name for your role in the “Role name”. You can use the name “myRedshiftRole”. 

Step 9: Review the role information, then select “Create Role”. 

Step 10: Select the name of the role that you have created. 

Step 11: Copy the Role ARN to your clipboard. 

Now that the role is ready, you should attach it to your cluster. 

Part 4: Create a Redshift Cluster

The following steps can help you create an Amazon Redshift cluster:

Step 1: Sign in to the AWS Management Console and launch Amazon Redshift here.

Step 2: Select the AWS region in which you need to create the cluster. 

Step 3: Choose “CLUSTERS” from the navigation menu, then select “Create cluster”. 

Create Cluster

The Create cluster page will be opened. 

Step 4: Specify the values for Cluster identifier, Node type, and Nodes. 

Step 5: In the section for Database Configurations, specify the values for Database name (optional), Database port (optional), Master user name, and Master user password. 

Step 6: You can also select the IAM role that you created. 

Step 7: Click “Create cluster”. 

Part 5: Authorize Access to the Cluster

Before launching your cluster, you should first configure a security group to authorize access. 

Follow the steps given below:

Step 1: Choose “Clusters” from the navigation pane of the Amazon Redshift Console. 

Step 2: Select the cluster then open the Configuration tab. 

Step 3: Select your security group under Cluster Properties for VPC Security Groups. 

Step 4: Once your security group is opened in the Amazon EC2 Console, open the “Inbound” tab. 

Inbound tab

Step 5: Choose Edit, Add Rule, and then enter the information given below:

  • Type: Custom TCP Rule
  • Protocol: TCP
  • Port Range: Enter the same port number that you entered when launching the cluster. The default port for Redshift is 5439, but you may be using a different port.
  • Source: Choose Custom, then enter 0.0.0.0/0.

Click the “Save” button. 

Edit inbound rules and save

Part 6: Connect to the Cluster and Execute Queries

If you need to execute queries against the databases hosted by your Redshift cluster, you can use either of these two approaches:

Option 1: Establish a connection to the cluster and execute queries on the AWS Management Console with the query editor. 

Option 2: Establish a connection to the cluster using an SQL client tool like SQL Workbench/J. 

The query editor is the simplest way of running queries against the databases hosted by your cluster. 

Once you create the cluster, you can immediately execute queries using the console. 

If you decide to use an SQL Client, you must install the SQL Client drivers and tools to help you connect to the cluster. 

Examples of such drivers and tools include the JDBC and ODBC drivers. 

Part 7: Limitations

The following are the limitations of AWS Redshift:

  1. Limited support for parallel upload- You can quickly load data into Redshift from Amazon S3, Amazon EMR, and relational DynamoDBs using Massively Parallel Processing. 

However, Redshift does not support parallel loading from other sources. 

Due to that, you may be forced to use JDBZC inserts, ETL solutions, or scripts to load the data. 

  1. Doesn’t enforce uniqueness- Redshift does not provide a way of enforcing uniqueness on inserted data. 

If you have a distributed system writing data to Redshift, you will have to personally take measures to handle the uniqueness. 

Part 8: Use Hevo Data

Hevo Data provides its users with a simpler platform for integrating data for analysis. 

It is a no-code data pipeline that can help you combine data from multiple sources

It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have analysis-ready data in your desired destination. 

Your job will be to focus on key business needs and perform insightful analysis using BI tools. 

Conclusion

In this Redshift Tutorial, you’ve learned more about Amazon Redshift.  You’ve also learned how to get started with Amazon Redshift. 

Moreover, extracting complex data from a diverse set of data sources can be quite challenging, however, a simpler alternative like Hevo is the right solution for you! 

Hevo Data is a No-Code Data Pipeline that offers a faster way to move data from 100+ Data Sources including 40+ Free Sources, into your Data Warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.

VISIT OUR WEBSITE TO EXPLORE HEVO

Want to take Hevo for a spin?

SIGN UP and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your experience of with this Redshift Tutorial in the comments section below!

No-code Data Pipeline for Redshift