Load Data from JSON to Redshift: 2 Easy Methods

on Amazon Redshift, Data Integration, Data Warehouse, ETL, ETL Tutorials, Tutorials • October 22nd, 2021 • Write for Hevo

In this digitized world, almost everything is generating data, and it has become a crucial task for a Data Engineer to store the data correctly to perform analytics. Applications use their format to generate data, out of which the most common data structure is JSON. JSON is popular because of its simplicity and ease of integration with most of the widely used applications. In this blog post, we aim to discuss how you can efficiently load data from JSON to Redshift.

Table of Contents

Introduction to JSON Data

The JSON data structure is a key-value pair, and it usually contains a set of JSON objects and arrays. 

A JSON object is an unordered collection of key-value pairs that begins and ends with braces. The key values are enclosed in separate quotes and separated by a colon. Commas separate the pairs from each other. 

A JSON array is a collection of JSON Objects, which is separated by commas and the JSON array begins and ends with brackets.

JSON objects and arrays may contain nested structures to have a hierarchical data structure.  Below is the sample JSON Object that contains the nested array and objects. 

{
"menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]  }

Introduction to Redshift

JSON to Redshift: Redshift
Amazon Redshift

AWS Redshift is a cloud-based data warehouse provided by Amazon as a part of Amazon Web Services. It is a fully managed and cost-effective data warehouse solution. AWS Redshift is designed to store petabytes of data and can perform real-time analysis to generate insights.

AWS Redshift is a column-oriented database, and stores the data in a columnar format as compared to traditional databases that store in a row format. Amazon Redshift has its own compute engine to perform computing and generate critical insights. 

AWS Redshift is very cost-effective, and it costs around $1000/TB/year, thereby making it very cost-effective as compared to traditional data warehouses. 

To know more about AWS Redshift, follow the official documentation here.

Hevo Data: Load your Data in Redshift Seamlessly

Hevo Data is a No-code Data Pipeline. It can help you to integrate data from 100+ sources and load it in a data warehouse of your choice, including Redshift, BigQuery, Snowflake, etc. Hevo is a fully managed platform that completely automates the process of not only loading data from your desired source but also enriching and transforming data into an analysis-ready form without writing a single line of code. Its fault-tolerant architecture ensures that the data is handled securely and consistently with zero data loss. 

GET STARTED WITH HEVO FOR FREE

Some benefits of Hevo are listed below:

  1. Simple: Hevo offers a simple and intuitive user interface to the users. It has a minimal learning curve. 
  2. Secure: Hevo provides end-to-end encryption and two-factor authentication and makes sure your data is secure.
  3. Scalability: Hevo is built to handle millions of records per minute without any latency. 
  4. Real-Time: Hevo provides real-time data migration. So, your data is always ready for analysis.
  5. Fully Automated: Hevo can be set up in a few minutes and requires zero maintenance and management. 
  6. Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.

Simplify your Data Analysis with Hevo today! 

SIGN UP HERE FOR A 14-DAY FREE TRIAL!

Methods of Loading Data from JSON to Redshift

There are two ways of loading data from JSON to Redshift:

  1. Method 1: Load Using Redshift Copy Command
  2. Method 2: Load Using AWS Glue

Let’s explore each option to load data from JSON to Redshift in detail.

Method 1: Load Using Redshift Copy Command

Redshift has an in-built command called a “COPY” command that allows you to move data from AWS S3 to the Redshift warehouse. It uses AWS S3 as the source and transfers the data from AWS S3 to the Redshift warehouse. The Copy command can move all types of files that include CSV, Parquet, JSON, etc.

Let’s have a look at the step by step approach to copy the JSON file from AWS S3 to Redshift:

Step 1: Create and Upload JSON File to S3

Let’s see how you will create and upload JSON file to S3.

  1. Create a sample JSON file named employee.json and add a few sample records of the employee as below.
{“emp_id” : 1, “emp_name” : “Sam”, “emp_dept” : “HR”}
{“emp_id” : 2, “emp_name” : “Jib”, “emp_dept” : “Tech”}
{“emp_id” : 3, “emp_name” : “Nick”, “emp_dept” : “IT”}
{“emp_id” : 4, “emp_name” : “Joseph”, “emp_dept” : “Tech”}
  1. Upload the file to AWS S3 using AWS CLI.
Syntax - aws s3 cp <source file> <target S3 path>
 
Example - aws s3 cp employee.json s3://test-bucket/json/

Step 2: Create JSONPath File

JSONPath is an expression that specifies the path to a single element in a JSON hierarchical data structure. Amazon Redshift doesn’t support any JSONPath features, such as wildcard characters or filter expressions, and only expects the complete JSONPath of that element to parse it correctly. As a result, Amazon Redshift can’t parse complex, multi-level data structures.

  1. Create the JSONPath file for the above JSON. Copy the below content and save it in a file as – employee_jsonpath.json.
{

"jsonpaths": [

"$.emp_id",

"$emp_name",

"$emp_dept"
]
}
  1. Use AWS CLI to upload the file into S3.
Syntax - aws s3 cp <source file> <target S3 path>
 
Example - aws s3 cp employee_jsonpath.json s3://test-bucket/jsonpaths/

Step 3: Load the Data into Redshift

Let’s see what are the steps to load data into Redshift.

  1. First, you have to create table DDL in the Redshift, which will hold the schema information of the JSON.
CREATE TABLE employee(emp_id INT, emp_name STRING, emp_dept STRING, primary key(emp_id))
  1. Load the data from the AWS S3 to Redshift using the Redshift Copy command.
copy employees

from 's3://test_bucket/json/employee.json’

credentials 'aws_iam_role=arn:aws:iam::xxxxx:role/Redshift'

region 'eu-west-3'

json 's3://test_bucket/jsonpath/employee_jsonpath.json';
  1. You will get a successful message when the above command finishes execution.
  2. Use the below command to check the loaded data.
select * from employee

Voila! You have successfully loaded the data from JSON to Redshift using the COPY command.

Method 2: Load Using AWS Glue

In this example, you will be using sensor data to demonstrate the load of JSON data from AWS S3 to Redshift. Let’s see the outline of this section:

  1. Pre-requisites
  2. Step 1: Create a JSON Crawler
  3. Step 2: Create Glue Job

Pre-requisites

  1. Understanding and working knowledge of AWS S3, Glue, and Redshift.
  2. Active AWS account, with full access roles for S3, Glue, and Redshift.

Step1: Create a JSON Crawler

Let’s see the steps to create a JSON crawler:

  1. Log in to the AWS account, and select AWS Glue from the service drop-down.
  2. First, you need to define a Classifier, so that each JSON record will load into a single row in Redshift. Otherwise, Redshift will load the entire JSON as a single record, and it isn’t beneficial for the analysis.
  3. To create a Classifier, on the left menu of the Glue page, select Crawlers > Classifiers, and then click on Add Classifier
JSON to Redshift: Create Classifier
  1. Provide the classifier name, and select the Classifier type as JSON and provide JSONPath as $[*].
JSON to Redshift: Add Classifier
  1. Once you add the classifier, click on the Crawler and then Add Crawler to fetch JSON files from S3.
JSON to Redshift: Create Crawler
  1. Set the Crawler name, and select the above created Custom Classifier.
JSON to Redshift: Add Crawler
  1. Choose the data source as S3, and provide the location to S3 where files are located.
JSON to Redshift: Choose S3
  1. Select the appropriate IAM role to allow Glue to access S3 and Redshift.
JSON to Redshift: Select IAM role
  1. Once you specify all the required roles, now you need to select the Schedule for the crawler to run. Select “Run on Demand” option.
JSON to Redshift: Click Run on demand
  1. Next, you have to select the output for the Crawler, select Add Database, and then specify the name of the database. Click Finish.
JSON to Redshift: Add Database

Step 2: Create Glue Job

Now that you have created Crawler, that will crawl S3 files and load it into a DB; you now need to create a Glue job that will execute this Crawler to load the JSON file, parse it and then load into Redshift. Follow the below steps to create Glue Job.

  1. From the left pane, under the ETL section, select Jobs and then click on Add Job.
JSON to Redshift: Add job
  1. Specify the name of the Job, and select appropriate AWS Role, and click Next by keeping everything else as default.
  2. Now, in the input data source, select the table name that contains the crawled JSON schema.
JSON to Redshift: Choose Data Source
  1. For the target location, choose JDBC as the Data Store and click on Add connection.
JSON to Redshift: Add Connection
  1. In the Add new connection dialog box, select Amazon Redshift as the connection type and mention all the required database credentials.
JSON to Redshift: Add Connection
  1. Check the source to target Mapping.
JSON to Redshift: Source to Target Mapping
  1. Click on Next, then Save and then click on Run Job.
JSON to Redshift: Run job
  1. Once the job completes, navigate to services and select Amazon Redshift.
  2. On the Redshift, run the query to check the data.
JSON to Redshift: Run Query

Conclusion

In this article, you went through a detailed step-by-step approach on how you can load data from JSON to Redshift. However, the above-mentioned ways need some programming language and understanding of tools. If you’re looking for an all-in-one solution, that will not only help you transfer data but also transform it into an analysis-ready form, then Hevo Data is the right choice for you!

Hevo is a No-code Data Pipeline. It will take care of all your analytics needs in a completely automated manner, allowing you to focus on key business activities. It also supports pre-built integration from 100+ data sources at a reasonable price.

VISIT OUR WEBSITE TO EXPLORE HEVO

Want to take Hevo for a spin?

SIGN UP and experience the feature-rich Hevo suite first hand.

Share your experience of loading data from JSON to Redshift in the comment section below.

No-code Data Pipeline for Amazon Redshift