Companies widely use Data Warehouses to run their business operations efficiently and use data-driven business strategies to increase their revenue. Amazon Redshift is one of the majorly used Cloud Data Warehouse that offers a plethora of features to help businesses find new opportunities. 

A Data Warehouse writes many records every day and many time a large set of data remain unsorted in the table. Companies also need to restructure the table data. All this can be achieved via the Redshift Copy Table where a copy of the table will be created and then the data get sorted.

Deep Copy makes it easier for users to get Amazon Redshift Copy Table and it automatically sorts the table in the bulk insertion of data. In this article, you learn about Amazon Redshift Copy Table and different methods to perform Amazon Redshift Copy Table via deep copy.

Introduction to Amazon Redshift

Amazon Redshift Logo
Image Source

Amazon Redshift is a Cloud-based serverless Data Warehouse provided by Amazon as a part of Amazon Web Services. It is a fully managed and cost-effective Data Warehouse solution. Amazon Redshift is designed to store petabytes of data and perform real-time analysis to generate insights.

Amazon Redshift is a column-oriented database that stores the data in a columnar format compared to traditional Databases stored in a row format. Amazon Redshift has its own compute engine to perform computing and generate critical insights. 

To know more about Amazon Redshift, follow the official documentation here.

Amazon Redshift Architecture

Amazon Redshift has straightforward Architecture. It contains a leader node and cluster of compute nodes that perform Analytics on data. The below snap depicts the schematics of Amazon Redshift architecture: 

Architecture - Amazon Redshift Copy Table
Image Source

Amazon Redshift offers JDBC connectors to interact with client applications using major programming languages like Python, Scala, Java, Ruby, etc.

Key Features of Amazon Redshift

  • Amazon Redshift allows users to write queries and export the data back to Data Lake.
  • Amazon Redshift can seamlessly query the files like CSV, Avro, Parquet, JSON, ORC directly with the help of ANSI SQL.
  • Amazon Redshift has exceptional support for Machine Learning, and developers can create, train and deploy Amazon Sagemaker models using SQL.
  • Amazon Redshift has an Advanced Query Accelerator (AQUA) which performs the query 10x faster than other Cloud Data Warehouses.
  • Amazon Redshift’s Materialistic view allows you to achieve faster query performance for ETL, batch job processing, and dashboarding.
  • Amazon Redshift has a petabyte scalable architecture, and it scales quickly as per need.
  • Amazon Redshift enables secure sharing of the data across clusters.
  • Amazon Redshift provides consistently fast performance, even with thousands of concurrent queries.

Amazon Redshift Deep Copy

Deep copy is a process followed by Redshift users that recreate the table using a systemic approach of using bulk inserts in the table. This process repopulates the table and automatically sorts the table.

While performing Deep Copy for Amazon Redshift Copy Table, the user should keep in mind that they should not make concurrent updates during the operation; otherwise, it will be difficult to track the deltas.

Steps to Perform Amazon Redshift Copy Table

There are three ways to perform Deep Copy for Redshift Copy Table. Let’s have a detailed look at all three approaches listed below:

1) Using the Original Table & DDL Command

This method is useful when the Original DDL of the table is available. This method is one of the fastest and preferred methods to perform Deep copy for the Amazon Redshift Copy Table.

With the help of Create Table DDL available, you can create a new table along with all the columns and attributes, including primary and foreign keys.

Steps to Create Deep Copy for Amazon Redshift Copy Table

  1. Using the CREATE Table DDL, create a copy of the parent table.
  2. Use the INSERT command to insert the data from the parent table to the newly created table. 
  3. Drop the parent table.
  4. Use the ALTER TABLE command to rename the new table with the original name.

Let’s understand the above steps with a real-time example. Consider a table of sales that contains the sales records of the products as shown in the code below:

CREATE TABLE SALES_COPY (
sales_id int,
order_id int,
transaction_id int,
sales_date datetime,
sales_transaction double);

INSERT INTO SALES_COPY (SELECT * FROM SALES) 
  • Once the data is completely copied to the SALES_COPY table, drop the original table as shown in the code below:
DROP TABLE SALES;
  • Rename the new table with the old table as shown in the code below:
ALTER TABLE SALES_COPY RENAME TO SALES;

2) Using CREATE TABLE LIKE Command

This method is useful when the create table DDL is not available. This method uses the CREATE TABLE LIKE command that inherits the encoding, distribution key, sort key, etc., from the old table to the new table. It doesn’t inherit the old table’s primary and foreign keys.

Steps to create Deep Copy for Amazon Redshift Copy Table

  1. Using the CREATE TABLE LIKE command, create a copy of the parent table.
  2. Use the INSERT command to insert the data from the parent table to the newly created table. 
  3. Drop the parent table.

Let us understand the above steps with a real-time example. Consider a table sales that contains the sales records of the products. 

CREATE TABLE SALES_COPY (
sales_id int,
order_id int,
transaction_id int,
sales_date datetime,
sales_transaction double);

INSERT INTO SALES_COPY (SELECT * FROM SALES) 
  • Once the data is completely copied to the SALES_COPY table, drop the original table as shown in the code below:
DROP TABLE SALES;
  • Rename the new table with the old table as shown in the code below:
ALTER TABLE SALES_COPY RENAME TO SALES;

3) Amazon Redshift Copy Table: Using a Temporary Table & Truncating Original Table

This method is useful when dependencies are in the parent table and cannot be deleted. Using the temporary table approach improves the performance, but also there is a risk of losing data.

The temporary table is session-based. Hence the temporary table will automatically drop when the current session ends. Therefore this method produces a risk of losing data if anything goes wrong. 

Steps to perform Deep Copy for Amazon Redshift Copy Table

  • Use CREATE TABLE AS SELECT command (CTAS) to create a temporary table similar to the parent table. 
  • Truncate the parent table.
  • Use the INSERT command to insert the data from the temporary table to the parent table.

Let us understand the above steps with a real-time example. Consider a table of sales that contains the sales records of the products. The code is shown below:

CREATE TEMP TABLE SALES_TEMP AS (SELECT * FROM SALES) 
TRUNCATE SALES;
INSERT INTO SALES (SELECT * FROM SALES_TEMP);
DROP TABLE SALES_TEMP;

Conclusion

In this article, You learnt about Amazon Redshift Copy Table via the Deep Copy method that efficiently performs the deep copy of the table and automatically sorts the data. It becomes essential to maintain the data in the sorted form so that users can make quick queries and operations on the data.

Visit our Website to Explore Hevo

Amazon Redshift stores data from multiple sources and every source follows a different schema. Instead of manually writing scripts for every source to perform the ETL (Extract Transform Load) process, one can automate the whole process. Hevo Data is a No-code Data pipeline solution that can help you transfer data from 100+ sources to Amazon Redshift or other Data Warehouse of your choice. Its fault-tolerant and user-friendly architecture fully automates the process of loading and transforming data to destination without writing a single line of code. 

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of learning about the Amazon Redshift Copy Table in the comments section below!

Vishal Agrawal
Technical Content Writer, Hevo Data

Vishal Agarwal is a Data Engineer with 10+ years of experience in the data field. He has designed scalable and efficient data solutions, and his expertise lies in AWS, Azure, Spark, GCP, SQL, Python, and other related technologies. By combining his passion for writing and the knowledge he has acquired over the years, he wishes to help data practitioners solve the day-to-day challenges they face in data engineering. In his article, Vishal applies his analytical thinking and problem-solving approaches to untangle the intricacies of data integration and analysis.

No-code Data Pipeline For your Amazon Redshift