Redshift Temp Tables: Syntax, Commands, & Tips Simplified 101

on Amazon Redshift, Data Warehouses, Temporary Tables, Tutorials • September 22nd, 2021 • Write for Hevo

Redshift Temp Table - Feature Image

Venturing into Data Science and deciding on a tool to use to solve a given problem can be challenging at times especially when you have a wide array of choices. In this age of data transformation where organizations are constantly seeking out ways to improve the day to day handling of data being produced and looking for methods to minimize the cost of having these operations, it has become imperative to handle such data transformations in the Cloud as it is a lot easier to manage and is also cost-efficient.

Data Warehousing architectures have rapidly changed over the years and most of the notable service providers are now Cloud-based. Therefore, companies are increasingly on the move to align with such offerings on the Cloud as it provides them with a lower upfront cost, enhances scalability, and performance as opposed to traditional On-premise Data Warehousing systems. Amazon Redshift Data Warehouse is an excellent replacement to traditional data warehouses, as it addresses several flaws of dated in-house data warehouses.

Unlike on-premise data warehouse solutions, Amazon Redshift provides users with more flexibility at a lower cost. Moreover, it supports SQL features like creating Redshift Temp Tables to store non-permanent data. In this article, you will learn how to set up Redshift temporary tables or Redshift Temp Tables and work effectively to implement them.

In this article, you will be introduced to Amazon Redshift and Redshift Temp Tables. You will also learn about the key features of Amazon Redshift. Moreover, you will also be introduced to the concept of Temporary and Permanent tables in Redshift. You will also know about the steps involved in setting up Amazon Redshift Temp Tables. Read along to learn more about Amazon Redshift Temp Tables.

Table of Contents

Prerequisite

  • Understanding of SQL
  • Understanding of databases

What is Amazon Redshift?

Amazon Redshift - Redshift Temp Table
Image Source

Amazon Web Services (AWS) is a subsidiary of Amazon saddled with the responsibility of providing a cloud computing platform and APIs to individuals, corporations, and enterprises. AWS offers high computing power, efficient content delivery, database storage with increased flexibility, scalability, reliability, and relatively inexpensive cloud computing services.

Download the Cheatsheet on How to Set Up High-performance ETL to Redshift
Download the Cheatsheet on How to Set Up High-performance ETL to Redshift
Download the Cheatsheet on How to Set Up High-performance ETL to Redshift
Learn the best practices and considerations for setting up high-performance ETL to Redshift

Amazon Redshift, a part of AWS, is a Cloud-based Data Warehouse service designed by Amazon to handle large data and make it easy to discover new insights from them. Its operations enable you to query and combine exabytes of structured and semi-structured data across various Data Warehouses, Operational Databases, and Data Lakes.

Amazon Redshift is built on industry-standard SQL with functionalities to manage large datasets, support high-performance analysis, provide reports, and perform large-scaled database migrations. Redshift has a concept of temporary tables or Redshift Temp tables which exist just for the duration of the session, and they automatically discard the data when the session ends.

For further information on Amazon Redshift, you can follow the Official Documentation.

Key Features of Amazon Redshift

The key features of Amazon Redshift are as follows:

1) Massively Parallel Processing (MPP)

Image Source

Massively Parallel Processing (MPP) is a distributed design approach in which the divide and conquer strategy is applied by several processors to large data jobs. A large processing job is broken down into smaller jobs which are then distributed among a cluster of Compute Nodes. These Nodes perform their computations parallelly rather than sequentially. As a result, there is a considerable reduction in the amount of time Redshift requires to complete a single, massive job.

2) Fault Tolerance

Image Source

Data Accessibility and Reliability are of paramount importance for any user of a database or a Data Warehouse. Amazon Redshift monitors its Clusters and Nodes around the clock. When any Node or Cluster fails, Amazon Redshift automatically replicates all data to healthy Nodes or Clusters.

3) Redshift ML

Image Source

Amazon Redshift houses a functionality called Redshift ML that gives data analysts and database developers the ability to create, train, and deploy Amazon SageMaker models using SQL seamlessly.

4) Column-Oriented Design

Image Source

Amazon Redshift is a Column-oriented Data Warehouse. This makes it a simple and cost-effective solution for businesses to analyze all their data using their existing Business Intelligence tools. Amazon Redshift achieves optimum query performance and efficient storage by leveraging Massively Parallel Processing (MPP), Columnar Data Storage, along with efficient and targeted Data Compression Encoding schemes.

Simplify Redshift ETL and Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ different sources (including 40+ free sources) to a Data Warehouse such as Amazon Redshift or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Get Started with Hevo for Free

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (including 40+ free sources) that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

What is Redshift Temp Table?

Amazon Redshift allows users to create temporary tables by making use of the temp or temporary keyword while creating new tables for their Amazon Redshift instance. The Redshift temp tables exist just for the duration of the session, and they immediately delete the data when the session ends. Redshift temp tables are often generated in order to be joined into a query later in the session. For the upcoming join operation, it is critical to build the Redshift Temporary table with the best column encoding, distribution key, and sort key.

One distinction between regular tables and temporary tables is how they are typically used. Because Redshift Temp tables are session-scoped, including them into a process or report would almost certainly result in the creation of Redshift Temp Tables multiple times. Redshift Temp tables may seem similar to regular tables, however, most regular tables are not rewritten every time they are queried. The objective is to combine table schema control with ease of use and adaptability. It is simple to ensure that the Redshift Temp table is appropriately built.

Download the Cheatsheet on How to Set Up High-performance ETL to Redshift
Download the Cheatsheet on How to Set Up High-performance ETL to Redshift
Download the Cheatsheet on How to Set Up High-performance ETL to Redshift
Learn the best practices and considerations for setting up high-performance ETL to Redshift

Different statements used to create Redshift Temp tables and insert values into them are as follows:

1) CREATE TABLE Statement

CREATE is a Data Definition Language (DDL) command in the Amazon Redshift, which is used to create new tables for your Amazon Redshift instance. Users must provide the table and columns a name, as well as the data types that will be used in it. The following syntax can be used with the Amazon Redshift Create Table command:

CREATE TABLE table_name (
column_name1 data_type1,
column_name2 data_typ2
)

For Example:

CREATE TABLE Books
(
book_id      integer(30),
book_name       varchar(50),
author_name        varchar(50),
Publisher_name     varchar(60),
editions             integer
);

Amazon Redshift also allows users to easily access or load their Amazon Redshift data by modifying their instructions with parameters/keywords such as “Like,” “As,” “Default,” and so on.

An advanced syntax of Create Table function will look something like this:

CREATE [ [LOCAL ] { TEMPORARY | TEMP } ] TABLE 
[ IF NOT EXISTS ] table_name
( column_name data type [ column_constraint ] | [ Table Constraint ] |
[ column attribute] | [ table attribute] | LIKE
);

where column_attributes are:
  [ DEFAULT default_expr ]
  [ IDENTITY ( seed, step ) ] 
  [ GENERATED BY DEFAULT AS IDENTITY ( seed, step ) ]             
  [ ENCODE encoding ] 
  [ DISTKEY ]
  [ SORTKEY ]
  [ COLLATE CASE_SENSITIVE | COLLATE CASE_INSENSITIVE  ]

and column_constraints are:
  [ { NOT NULL | NULL } ]
  [ { UNIQUE  |  PRIMARY KEY } ]
  [ REFERENCES reftable [ ( refcolumn ) ] ] 

and table_constraints  are:
  [ UNIQUE ( column_name [, ... ] ) ]
  [ PRIMARY KEY ( column_name [, ... ] )  ]
  [ FOREIGN KEY (column_name [, ... ] ) REFERENCES reftable [ ( refcolumn ) ] 

and table_attributes are:
  [ DISTSTYLE { AUTO | EVEN | KEY | ALL } ] 
  [ DISTKEY ( column_name ) ]
  [ [COMPOUND | INTERLEAVED ] SORTKEY ( column_name [,...]) |  [ SORTKEY AUTO ] ]
  [ ENCODE AUTO ]

A) Parameters of CREATE TABLE Command

The parameters of CREATE TABLE Command are as follows:

I) Local Clause

This is an optional keyword that is accepted in the statement but has no effect on Amazon Redshift.

CREATE [LOCAL ] TABLE table_name
II) Temporary or Temp Clause

Users can build temporary tables in Amazon Redshift by using the Temp or Temporary keyword when creating new tables for their Amazon Redshift instance. In Redshift, the temporary table is only visible for the current session. Redshift Temp Tables are dropped at the end of each session they are generated in and cannot be recovered in any way. The BigQuery Temp Table is generated in a distinct, session-specific Redshift schema, though users cannot give this schema a name. The schema is private and hidden from the user.

Users can either use the temp keyword or a # sign right before the table name to create a temporary table (Redshift Temp Table). In other words, to create a Redshift Temp Table, simply specify the TEMPORARY keyword (or TEMP abbreviation) or # sign in your CREATE TABLE DDL statement.

Syntax 1: 

CREATE TABLE #table_name (
column_name1 data_type1,
column_name2 data_typ2
);

Syntax 2:

CREATE TEMPORARY TABLE table_name (
column_name1 data_type1,
column_name2 data_typ2
);

Syntax 3:

CREATE TEMP TABLE table_name (
column_name1 data_type1,
column_name2 data_typ2
);

The name of the Temporary table might be the same as the name of the Permanent table. Due to the way search paths are set up, it is possible for a temporary table to override a permanent one unless you explicitly specify it by using a schema name in the table name.

  • Staging Redshift Temp Table

The staging table is a Redshift Temp Table (temporary table) that stores all of the data that will be needed to perform updates and inserts to the target database.

The staging table and the target table must be joined for a merge operation to work. Set the staging table’s distribution key to the same column as the destination table’s distribution key to collocate the joining rows.

  • Global Redshift Temp Table

Users can also construct a globally accessible Redshift Temp Table (temporary table) by placing double hash (##table_name) symbols in front of the table name. When the person or process that generated them is no longer active, global Redshift Temp Tables expire as well. But, comparatively, a global Redshift Temp Table can be accessed by any database user while it is active.

III) If Not Exists Clause

Redshift lets customers employ the “If not exists” clause/parameter in their Amazon Redshift Create Table commands to prevent creating duplicate tables. If the “If not exists” clause/parameter is set, Amazon Redshift will check to see if the table already exists or not. In case it does not, Redshift will give an error message. 

Syntax:

CREATE TABLE 
[ IF NOT EXISTS ] table_name
IV) Default Clause

The default keyword is used to avoid problems with NULL values and has the following syntax:

column_name data_type DEFAULT Default_Value

For example:

author_name varchar(50), default  'Neil Gaiman',

If the user does not give any input for that column with default values in place, the default value will be applied automatically.

  • table_name

This is the name of the table that will be created. It can be up to 127 bytes.

  • column_name

It is for the name of a column that will be added to the new table. The column name can be up to 127 bytes long.

  • data_type

The column’s data type is being created.

V) IDENTITY(seed, step) Clause

The clause identifies the column as an IDENTITY column.

VI) PRIMARY KEY Clause

A Primary Key indicates that this group of columns may be used as a unique identifier for rows in other tables. This constraint will imply that a column or set of columns in a table can only contain non-null values that are unique (non-duplicate). Choosing a collection of columns as the Primary Key also offers information about the schema’s architecture. 

VII) FOREIGN KEY Clause

A Foreign Key constraint indicates that a group of one or more columns in the new table can only have values that match those in the referred column or columns of some row of the referenced table.

VIII) BACKUP { YES | NO } Clause

Whether or not the table should be included in automatic and manual clusters, snapshots are specified by this clause.

IX) DISTSTYLE Clause

The Data Distribution Style for the entire table is defined by this keyword. The rows of a table are distributed among compute nodes by Amazon Redshift according to the table’s distribution style i.e., AUTO, EVEN, and ALL.

X) DISTKEY Clause

It is the constraint that defines the column that will be used as the table’s distribution key.

XI) SORTKEY Clause

This parameter defines one or more sort keys for the table. When data is imported into the table, the columns selected as sort keys are used to sort the data.

2) CREATE TABLE AS Statement

Create Table As (CTAS) instructions are used in Redshift to create tables from select query results. The SQL query’s table structure and data are passed on to the CTAS statement. However, it doesn’t support inheriting constraints, default values, etc. 

Syntax:

CREATE TABLE Old_Books AS Select * from Books;

For the Redshift Temp table (Temporary table):

CREATE TEMP TABLE Old_Books AS Select * from Books;

3) CREATE TABLE LIKE Statement

This statement allows users to define a distinct table whose structure they want to duplicate without copying any data. The difference between the Create Table Like and Create Table As commands is as follows: The Create table LIKE clause creates a table based on a parent table’s structure, whereas the Create table AS (CTAS) clause creates a table based on a SQL query.

Add the “INCLUDING DEFAULTS” clause if you want the column level default values from the parent table transferred across. 

CREATE TABLE Books_backup (LIKE Books INCLUDING DEFAULTS);

If you wish to create a table without column defaults, such as NULL defaults, you may use the “EXCLUDING DEFAULTS” clause or just skip the statement as seen below.

CREATE TABLE Books_backup (LIKE Books);

Or

CREATE TABLE Books_backup (LIKE Books EXCLUDING DEFAULTS);

For creating a temporary table (Redshift Temp Table):

CREATE TEMP TABLE Books_backup (LIKE Books INCLUDING/EXCLUDING DEFAULTS);

4) INSERT INTO Statement

Using the INSERT INTO command and specifying values for all columns is the easiest approach to insert a record in Redshift.

Assuming the users table has only three columns: book_id, Book_name, and author_name, and in that order.

INSERT INTO users Books('0573', 'Anxious People', 'Fredrik Backman');

Users can also specify the column list and use the VALUES keyword in the above syntax.

INSERT INTO users (book_id, Book_name, author_name)
VALUES ('0573', 'Anxious People', 'Fredrik Backman');

The latter is preferable since users don’t have to remember the table’s column order or provide values for all columns, only the ones that are necessary. It is simpler to match a value to the column it is meant for when seeing it in the statement rather than having to look at the table description if there are numerous columns.

Once a column is added or deleted from the table, INSERT statements without a column list are invalidated. As a result, users will need to update their queries to reflect the new or removed column in order to resume their work.The same format is also applicable to a Temporary Table too.

After a table is populated, you can query it using:

SELECT * from table_name;

Users can create Temporary tables directly using Permanent tables and include input values into them. This also implies that every subquery defines and creates a Temporary table.

For Example:

SELECT book_id, book_name, author_name          
INTO #tmp_books
FROM books
WHERE editions > '2'

This will create a Temporary table and insert the book_id, book_name, and author_name of all employees in the physical books table with the books having more than two editions.

  • Note: Keep in mind that every time a subquery is performed, it creates a new temporary table which can cause frequent rewriting. If you alter your query to combine the logic into a single, bigger query, the disc writes required in filling the temporary table may be more costly than the reads. Do consider the IO cost of executing writes and then reading the data back into a subsequent query before utilizing temporary tables to aid in process debugging or to improve readability.

5) DROP TABLE Statement

This has the easiest syntax, however, once this command is given, the action cannot be reversed.

DROP TABLE table_name;

There are two possibilities for getting rid of a Redshift Temp Table. The first step is to disconnect from the database where the Redshift Temp Table was created. The other option is to use the command below.

DROP TABLE #table_name; (for Redshift temp table) 

Conclusion

In this article, you have learned about how to use the Amazon Redshift Create Table command to create new permanent and temporary tables in Amazon Redshift successfully. This article also provided information on Amazon Redshift, its key features, and Redshift Temporary Tables.

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.

Visit our Website to Explore Hevo

Hevo Data with its strong integration with 100+ sources (including 30+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice such as Amazon Redshift, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools. 

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing pricing, which will assist you in selecting the best plan for your requirements.

Share your experience of understanding the Amazon Redshift Data Lake Export feature in the comment section below! We would love to hear your thoughts.

No-code Data Pipeline for Amazon Redshift