Press "Enter" to skip to content

Redshift INSERT INTO – Usage and Coding Examples

What is Amazon Redshift?

Amazon Redshift is a completely managed data warehouse offered as a service. It is a columnar database with a PostgreSQL standard querying layer. Redshift’s biggest selling point is flexibility. It provides the customer though its ‘pay as you go’ pricing model. Redshift is tailor-made for executing lightning-fast complex queries over millions of rows of data. The customer is also relieved of all the maintenance and infrastructure management activities related to keeping a highly available data warehouse running. Redshift works based on a cluster architecture and it allows the users to select from multiple kinds of instances available depending on their use cases. Read more about Redshift’s feature here.

Our intention in this post is to look in detail at the Amazon Redshift INSERT INTO statement. The post covers the following:

  1. The need for INSERT INTO
  2. INSERT INTO – Usage and examples
  3. INSERT INTO – Limitations
  4. An easier way to load data to Redshift

Let us deep dive into this.

Redshift Insert Into

Redshift INSERT INTO – Need 

Being a data warehouse offered as a service, Redshift is typically used as part of an extract-transform-load pipeline. This means there is an obvious need to insert rows of data into Redshift continuously depending on the incoming data rate. These continuous inserts may not be a typical use case for bulk insert since the data as a whole may not be at rest and maybe part of intermittent streams. In such cases, the available options are to either write data to Redshift as individual rows or create batches of rows for a multi-row insert. In both these cases, the INSERT INTO statement is the one to use. Let’s look into the details of how to use this statement.

Redshift INSERT INTO – Steps to Follow

Redshift INSERT INTO follows the below template.

INSERT INTO table_name [ ( column [, ...] ) ]

{DEFAULT VALUES |

VALUES ( { expression | DEFAULT } [, ...] )

[, ( { expression | DEFAULT } [, ...] )

[, ...] ] |

query }

In the above template, let’s understand the purpose of each of the parameters. 

Redshift allows the user to specify an optional list of columns. If the columns are specified, the values need to be entered in the same order. If the columns are not specified, all the columns of the table should have an entry in the value list and in the same order of the columns. 

DEFAULT VALUES is a keyword which, if used with INSERT INTO statement will insert all the columns with a default value that was set during the table creation. 

Redshift allows the use of an expression instead of the column value. Redshift will automatically insert the value after evaluating the expression and converting the data type to match the column data type

The user can also provide the ‘DEFAULT’ keyword if he intends to write the default value for any column.

INSERT INTO statement also allows the use of a query instead of the ‘values’. Redshift will execute the query and insert all the resultant rows of the query, provided the results are compatible with the table column structure. 

Let us now look into some specific examples of single and multi-row inserts. Consider the below table called employee_records with fields emp_id, department, designation, and category.

Single row insert

INSERT INTO employee_records(emp_id,department,designation,category) values(1,’admin’,’assistant’,’contract’);

Single row without column list

INSERT INTO values(1,’admin’,’assistant’,’contract’);

The advantage with this kind of INSERT INTO is obviously the shorter query length. The disadvantage is that this is error-prone and the validation will fail if there is a data type mismatch caused by a change in the order of columns.  It is recommended to use INSERT INTO with a column list as much as possible.

Multi-row insert

INSERT INTO employee_records(emp_id,department,designation,category) values(1,’admin’,’assistant’,’contract’),

values(2,’admin’,’senior assistant’’,’permanent’),

values(3,’security’,’manager’’,’permanent’);

Even though it is possible to insert a large number of rows through this method, it is advised to use the COPY command instead of this if the use case is a bulk insert. The COPY command has inherent optimizations built-in for a faster load of data.

INSERT INTO using query

For this example consider there is another table called employee_records_stg with the same structure. To insert values to this table, use the below statement.

INSERT INTO employee_records_stg (SELECT * FROM employee_records)

Redshift INSERT INTO – Limitations

Even though INSERT INTO is the recommended way of inserting rows when it comes to an intermittent stream of records, it does have its share of limitations.

  1. The user needs to be extremely careful about the data types of incoming data and the table structure since Redshift will automatically convert data types to conform to table structure wherever possible. 
  2. It is not suitable for large amounts of data and trying so may result in painfully slow inserts.
  3. Using this statement without the column list is error-prone because of the default behaviour in case the value list does not match with the column structure. If the value list is less than the column list, Redshift will try to insert the values to first n columns and will not give an error if the data types are compatible.

Redshift INSERT INTO command is very easy to use and conforms to the standard PostgreSQL querying protocol. However, the user should be mindful of the above-mentioned limitations. Most of the time the query validation may not even provide an error. It might perform its own automatic conversions. When it comes to data warehouses with millions of rows, such discrepancies and errors are difficult to hunt and fix. One of the best ways to steer clear of such possible issues is to use a fully managed Data Integration Platform such as Hevo. 

Hevo for Redshift ETL

Hevo Data is built to handle all the limitations posed by INSERT INTO. Hevo can help load data from any data source to Redshift through a very easy to use interface. Hevo provides a code-free, hassle-free platform to load data into Redshift in real-time with zero data loss.

Explore the complete features here or sign up for a 14-day free trial to experience the platform.

How do you currently move data to Redshift? Let us know in the comments.

ETL Data to Redshift, Bigquery, Snowflake

Move Data from any Source to Warehouse in Real-time

Sign up today to get $500 Free Credits to try Hevo!
Start Free Trial