JSON is popular because of its simplicity and ease of integration with most of the widely used applications. You can use the advanced storage and query capabilities of Redshift for the replication. You can easily load data from JSON to Redshift via Amazon S3 or directly using third-party Data Integration tools.

Redshift also provides you with in-built SQL commands to carry out the data loading process. You can also use other AWS services like AWS Glue to load data from JSON to Redshift. The question is, which method suits your use case out of all these methods? We have got you covered in this blog on JSON to Redshift using three different methods.

What is JSON Data?

The JSON data structure is a key-value pair, and it usually contains a set of JSON objects and arrays. 

A JSON object is an unordered collection of key-value pairs that begins and ends with braces. The key values are enclosed in separate quotes and separated by a colon. Commas separate the pairs from each other. 

A JSON array is a collection of JSON Objects, which is separated by commas and the JSON array begins and ends with brackets.

JSON objects and arrays may contain nested structures to have a hierarchical data structure.  Below is the sample JSON Object that contains the nested array and objects. 

{
"menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]  }

What is Redshift?

JSON to Redshift: Redshift Logo | Hevo Data
Image Source

AWS Redshift is a cloud-based data warehouse provided by Amazon as a part of Amazon Web Services. It is a fully managed and cost-effective data warehouse solution. AWS Redshift is designed to store petabytes of data and can perform real-time analysis to generate insights.

AWS Redshift is a column-oriented database, and stores the data in a columnar format as compared to traditional databases that store in a row format. Amazon Redshift has its own compute engine to perform computing and generate critical insights. 

AWS Redshift is very cost-effective, and it costs around $1000/TB/year, thereby making it very cost-effective as compared to traditional data warehouses. 

Key Features of Redshift

Developers at Amazon have continuously improved Redshift over the years. Here are some of the eye-catching features that make Amazon Redshift a popular choice:

  • Scalability: Giving you an edge over traditional Data Warehouses, Redshift allows On-demand horizontal petabyte scalability. With just a few clicks on Amazon Web Services Console or via the Cluster API, you can scale up or down according to your business requirements. You can leverage the Amazon S3 data lake to store unlimited data in open data formats. 
  • Top-notch Performance: The performance-intensive workloads can be handled seamlessly by the R3 instances providing 3 times better performance compared to other alternatives. At no extra cost to you, R3 instances include Advanced Query Accelerator(AQUA) cache to get faster query results for large datasets. For repeated queries, amazon delivers the result immediately from the saved cache. 
  • Flexible Pricing: If you are just starting out, you can opt for hourly plans and scale to long-term plans later on. The scaling costs are also minimal for its customers. For scaling, you can use the Concurrency Scaling Credits that you earn every day from the clusters. You also get a choice between Dense Storage Nodes and Dense Compute Nodes. This allows you to optimally select the resources for your Workload and get the best price performance.  
  • Fully Managed: It is a completely managed service with all the mundane administrative tasks computerized. Features such as Automatic Vacuum Delete, Automatic Table Sort, and Automatic Analyze reduce manual effort, thereby providing a high-class query performance with no delays. The in-built machine learning capabilities apply complex algorithms to dynamically improve query speed.
  • Reliability: Redshift is extremely fault-tolerant. The cluster’s health is continuously monitored and automatic data replication from defective storage units is carried out to prevent any data loss. Data is automatically backed up in Amazon S3. Within minutes you can recover your data in a few clicks on AWS Management Console or by using the Redshift APIs. 
  • Secure: Amazon Redshift is a Compliant Cloud Service Provider with SOC1, SOC2, SOC3, and PCI DSS Level 1 requirements. Your data is guarded at all times with accelerated AES-256-bit security at rest and SSL Data Encryption in transit. You can also set access permissions for columns and rows for individuals and groups. You get more control as Redshift allows you to configure the firewall settings and isolate your data warehouse cluster using the Amazon Virtual Private Cloud.

Related: If you’re looking to enhance your knowledge of Redshift, we have a valuable resource on Redshift data storage that provides a comprehensive understanding of Redshift’s data storage capabilities. Additionally, for those interested in real-time analytics with Redshift, do check out real-time Redshift data analytics.

Explore These Methods to Load Data from JSON to Redshift

MySQL is a popular Open-Source Relational Database Management system that allows you to effectively store your Excel file data and manage it using simple SQL commands or via its simple user interface MySQL Workbench. To import Excel into MySQL, check out the following four methods:

Method 1: Using Hevo Data to Load JSON to Redshift in Minutes

Hevo Data is a No-code Data Pipeline. It can help you to integrate data from a vast sea of sources like FTP/SFTP and load it in a data warehouse of your choice, including Amazon Redshift, BigQuery, Snowflake, etc. Using Hevo you can easily upload files in formats such as CSV, JSON, and XML to your Redshift Data Warehouse.

Get Started with Hevo for Free

Hevo is a fully managed platform that completely automates the process of not only loading data from 150+ data sources but also enriching and transforming data into an analysis-ready form without writing a single line of code.

Method 2: Load JSON to Redshift using Copy Command

This method requires you to load JSON to Redshift using SQL commands in Redshift. Using the COPY command, you will transfer your JSON data from AWS S3 to Redshift. This method is only effective if you rarely need to import JSON data to Redshift.

Method 3: Load JSON to Redshift using AWS Glue

Users need to have technical & working knowledge of Amazon S3, AWS Glue, and Amazon Redshift to carry out the JSON to redshift data loading process. You will need an Active AWS account, with full access roles for S3, Glue, and Redshift.

Methods of Loading Data from JSON to Redshift

There are three ways of loading data from JSON to Redshift:

Let’s explore each option to load data from JSON to Redshift in detail.

Method 1: Load JSON to Redshift in Minutes using Hevo Data

JSON to Redshift - Hevo Data Logo | Hevo Data
Image Source

Hevo Data is a No-code Data Pipeline solution that can help you move data from 150+ data sources like FTP/SFTP & Amazon S3  to your Data Warehouse like Amazon Redshift, or BI tools in a completely hassle-free & automated manner. Using Hevo you can easily upload files in formats such as CSV, JSON, and XML to your Redshift Data Warehouse.

To load JSON-formatted Source data to your desired Destination, Hevo allows you to select the JSON Data parsing strategy for the nested fields, objects, and arrays so that these are read correctly. For instance, to load data from JSON to Redshift, a default parsing strategy of replicating JSON fields to JSON columns is applied. To know more, you can check out the Hevo Documentation for Parsing Nested JSON Fields in Events.

Hevo also supports Amazon Redshift as a Source for loading data to a destination of your choice. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Hevo Data takes care of all your data preprocessing needs and lets you focus on key business activities and draw a more powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. 

Take a look at some of the salient features of Hevo:

  • Exceptional Security: A Fault-tolerant Architecture that ensures Zero Data Loss.
  • Built to Scale: Exceptional Horizontal Scalability with Minimal Latency for Modern-data Needs.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Auto Schema Mapping: Hevo takes away the tedious task of schema management & automatically detects the format of incoming data and replicates it to the destination schema. You can also choose between Full & Incremental Mappings to suit your Data Replication requirements.
  • Built-in Connectors: Support for 150+ Data Sources, including Databases, SaaS Platforms, Files & More. Native Webhooks & REST API Connector available for Custom Sources.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Blazing-fast Setup: Straightforward interface for new customers to work on, with minimal setup time.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

To effortlessly load data from JSON to Resdhift using Hevo, follow the simple steps given below:

  • Step 1: Connect Hevo Data with your system by setting FTP/SFTP as a source. You can provide a unique name to your Pipeline along with information such as Port Number, Username, Password, etc. Hevo currently supports CSV, JSON, and XML formats. You can specify the file format as JSON for your use case.
json to redshift - Configure your FTP SFTP Source | Hevo Data
Image Source
  • Step 2: For completing the process to load data from JSON to Redshift, you can start by providing your Redshift Data Warehouse credentials such as your authorized Username and Password, along with information about your Host IP Address and Port Number value. You will also need to provide a name for your database and a unique name for this destination.
json to redshift: Configure your Redshift destination | Hevo Data
Image Source

Method 2: Load JSON to Redshift Using Copy Command

For loading your data from JSON to Redshift, you can use the in-built command in Redshift called a “COPY” command that allows you to move data from AWS S3 to the Redshift warehouse. It uses AWS S3 as the source and transfers the data from AWS S3 to the Redshift warehouse. The Copy command can move all types of files including CSV, Parquet, JSON, etc.

Download the Cheatsheet on How to Set Up High-performance ETL to Redshift
Download the Cheatsheet on How to Set Up High-performance ETL to Redshift
Download the Cheatsheet on How to Set Up High-performance ETL to Redshift
Learn the best practices and considerations for setting up high-performance ETL to Redshift

Let’s have a look at the step by step approach to copy the JSON file from AWS S3 to Redshift:

Step 1: Create and Upload JSON File to S3

To get started with the Redshift JSON extraction process, let’s see how you will create and upload the JSON files to S3.

  • Step 1: Create a sample JSON file named employee.json and add a few sample records of the employee as below.
{“emp_id” : 1, “emp_name” : “Sam”, “emp_dept” : “HR”}
{“emp_id” : 2, “emp_name” : “Jib”, “emp_dept” : “Tech”}
{“emp_id” : 3, “emp_name” : “Nick”, “emp_dept” : “IT”}
{“emp_id” : 4, “emp_name” : “Joseph”, “emp_dept” : “Tech”}
  • Step 2: Upload the file to AWS S3 using AWS CLI.
Syntax - aws s3 cp <source file> <target S3 path>
 
Example - aws s3 cp employee.json s3://test-bucket/json/

Step 2: Create JSONPath File

JSONPath is an expression that specifies the path to a single element in a JSON hierarchical data structure. Amazon Redshift doesn’t support any JSONPath features, such as wildcard characters or filter expressions, and only expects the complete JSONPath of that element to parse it correctly. As a result, Amazon Redshift can’t parse complex, multi-level data structures.

  1. Create the JSONPath file for the above JSON. Copy the below content and save it in a file as – employee_jsonpath.json.
{

"jsonpaths": [

"$.emp_id",

"$emp_name",

"$emp_dept"
]
}
  1. Use AWS CLI to upload the file into S3.
Syntax - aws s3 cp <source file> <target S3 path>
 
Example - aws s3 cp employee_jsonpath.json s3://test-bucket/jsonpaths/

Step 3: Load the Data into Redshift

Let’s see what are the steps to load data into Redshift.

  1. First, you have to create table DDL in the Redshift, which will hold the schema information of the JSON.
CREATE TABLE employee(emp_id INT, emp_name STRING, emp_dept STRING, primary key(emp_id))
  1. Load the data from the AWS S3 to Redshift using the Redshift Copy command.
copy employees

from 's3://test_bucket/json/employee.json’

credentials 'aws_iam_role=arn:aws:iam::xxxxx:role/Redshift'

region 'eu-west-3'

json 's3://test_bucket/jsonpath/employee_jsonpath.json';
  1. You will get a successful message when the above command finishes execution.
  2. Use the below command to check the loaded data.
select * from employee

Voila! You have successfully loaded the data from JSON to Redshift using the COPY command.

How to Prevent Loss of Numeric Precision?

While using the COPY command, you may observe that loading numbers from a JSON-formatted data file into a column defined as a numeric data type can reduce accuracy. Some floating point values ​​are not accurately represented in computer systems. As a result, the data copied from the JSON file may not be rounded as expected. For better accuracy, you can opt for any of the following :

  • You can denote a number as a string by enclosing the value in double quotes.
  • Employ ROUNDEC to round numbers rather than truncating them.
  • You can start using CSV, delimiters, or fixed-width text files instead of JSON or Avro files.

Method 3: Load JSON to Redshift using AWS Glue

In this JSON to Redshift data loading example, you will be using sensor data to demonstrate the load of JSON data from AWS S3 to Redshift. Let’s see the outline of this section:

Pre-requisites

  • Understanding and working knowledge of AWS S3, Glue, and Redshift.
  • Active AWS account, with full access roles for S3, Glue, and Redshift.

Step 1: Create a JSON Crawler

For starting the JSON to Redshift data loading process, let’s first go through the steps to create a JSON crawler:

  • Step 1: Log in to the AWS account, and select AWS Glue from the service drop-down.
  • Step 2: First, you need to define a Classifier, so that each JSON record will load into a single row in Redshift. Otherwise, Redshift will load the entire JSON as a single record, and it isn’t beneficial for the analysis.
  • Step 3: To create a Classifier, on the left menu of the Glue page, select Crawlers > Classifiers, and then click on Add Classifier
JSON to Redshift: Create Classifier | Hevo Data
Image Source
  • Step 4: Provide the classifier name, and select the Classifier type as JSON and provide JSONPath as $[*].
JSON to Redshift: Add Classifier | Hevo Data
Image Source
  • Step 5: Once you add the classifier, click on the Crawler and then Add Crawler to fetch JSON files from S3.
JSON to Redshift: Create Crawler | Hevo Data
Image Source
  • Step 6: Set the Crawler name, and select the above created Custom Classifier.
JSON to Redshift: Add Crawler | Hevo Data
Image Source
  • Step 7: Choose the data source as S3, and provide the location to S3 where files are located.
JSON to Redshift: Choose S3 | Hevo Data
Image Source
  • Step 8: Select the appropriate IAM role to allow Glue to access S3 and Redshift.
JSON to Redshift: Select IAM role | Hevo Data
Image Source
  • Step 9: Once you specify all the required roles, now you need to select the Schedule for the crawler to run. Select “Run on Demand” option.
JSON to Redshift: Click Run on demand | Hevo Data
Image Source
  • Step 10: Next, you have to select the output for the Crawler, select Add Database, and then specify the name of the database. Click Finish.
JSON to Redshift: Add Database | Hevo Data
Image Source

Step 2: Create Glue Job

Now that you have created Crawler, that will crawl S3 files and load it into a DB; you now need to create a Glue job that will execute this Crawler to load the JSON file, parse it and then load into Redshift. Follow the below steps to create Glue Job.

  • Step 1: From the left pane, under the ETL section, select Jobs and then click on Add Job.
JSON to Redshift: Add job | Hevo Data
Image Source
  • Step 2: Specify the name of the Job, and select appropriate AWS Role, and click Next by keeping everything else as default.
  • Step 3: Now, in the input data source, select the table name that contains the crawled JSON schema.
JSON to Redshift: Choose Data Source | Hevo Data
Image Source
  • Step 4: For the target location, choose JDBC as the Data Store and click on Add connection.
JSON to Redshift: Add Connection | Hevo Data
Image Source
  • Step 5: In the Add new connection dialog box, select Amazon Redshift as the connection type and mention all the required database credentials.
JSON to Redshift: Add Connection | Hevo Data
Image Source
  • Step 6: Check the source to target Mapping.
JSON to Redshift: Source to Target Mapping | Hevo Data
Image Source
  • Step 7: Click on Next, then Save and then click on Run Job.
JSON to Redshift: Run job
Image Source
  • Step 8: Once the job completes, navigate to services and select Amazon Redshift.
  • Step 9: On the Redshift, run the query to check the data has been successfully loaded from JSON to Redshift.
JSON to Redshift: Run Query | Hevo Data
Image Source

Conclusion

In this article, you went through a detailed step-by-step approach on how you can load data from JSON to Redshift. However, the above-mentioned ways need some programming language and understanding of tools. If you’re looking for an all-in-one solution, that will not only help you transfer data but also transform it into an analysis-ready form, then Hevo Data is the right choice for you!

Hevo is a No-code Data Pipeline. It will take care of all your analytics needs in a completely automated manner, allowing you to focus on key business activities. It also supports pre-built integration from 150+ data sources at a reasonable price.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for a 14 day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of loading data from JSON to Redshift in the comment section below.

No-code Data Pipeline for Amazon Redshift