Amazon Redshift provides a fast, reliable, and petabyte-scale Data Warehouse solution that effectively connects to Business Intelligence tools, assisting companies in gaining insights into their customers and business operations.

To generate insights, Amazon Redshift facilitates Database Developers and Data Analysts to use Structured Query Language (SQL) on the data residing in the Data Warehouse. This article gives an overview of Amazon Redshift and describes the importance of one of the most popular Data Warehouses.

It also explains the essential components of SQL commands, syntax, parameters, and usage of an Amazon Redshift UPDATE statement. Read along to learn more about using SQL in Amazon Redshift! 

Prerequisites

  • An Amazon Redshift account.
  • Understanding of SQL Commands.
  • Basic knowledge about Redshift Data Warehouse.

Introduction to Amazon Redshift

Redshift Update: Amazon Redshift Logo

Amazon Web Service (AWS) has provided a broad range of products and services, which includes solutions ranging from data storage to enterprise applications. As organizations require various IT services to help them scale their applications with lower IT costs, AWS became one of the most used Cloud Computing platforms in the world.

With AWS Redshift, a user can query and combine petabytes of structured and semi-structured data across a Data Warehouse, Operational Database, or Data Lake using standard SQL programming.

Introduction to SQL Commands

A database is a systematic collection of data that supports electronic storage and manipulation. To query or store information in a database, you need a query language. Although there are several languages, Structured Query Language(SQL) is one of the widely used programming languages for working with databases, thus making it the language of a database.

It is specifically designed to facilitate retrieving information through queries from relational databases. Based on the type of information required, SQL queries are classified into 5 parts below:

Redshift Update: SQL Classification
Redshift Update: SQL Classification
  • Data Definition Language: Data Definition Language (DDL) deals with the structure of tables present inside the database. All the commands of DDL are auto-committed, which means they permanently save all changes in working databases. DDL commands include CREATE, ALTER, DROP, and TRUNCATE.
  • Data Manipulation Language: Data Manipulation Language (DML) deals with the modification of data in databases. DML commands are not auto-committed, and hence they can be rolled back. DML commands include, INSERT, UPDATE, and DELETE.
  • Data Query Language: Data Query Language (DQL) is used to fetch data from a database. It consists of a ‘SELECT’ command to choose attributes of interest. A ‘WHERE’ clause is used with DQL to return specific results from the entire data.
  • Data Control Language: Data Control Language (DCL) deals with the authorization of database users. The DCL consists of ‘GRANT’ and ‘REVOKE’ commands that give database administrators authority to give various permissions by limiting access to other users.
  • Transaction Control Logic: Transaction Control Logic (TCL) deals with a set of tasks as a single execution unit. The TCL commands consist of ‘COMMIT,’ ‘SAVEPOINT,’ ‘ROLLBACK,’ and ‘SET TRANSACTION.’ It is worth noting that TCL cannot be used while creating a table and can be used only with DML commands. Each transaction begins with a specific task and ends when all the relative tasks are logically executed. If any of the tasks fails, the entire transaction process fails. Thus, TCL consists of two results, success, and failure.

Understanding the Amazon Redshift UPDATE Statement

‘UPDATE’ is a DML command used to update or modify the value of a column in the table. It gives the flexibility to retrieve desired data by applying conditions using the ‘WHERE,’ ‘HAVING,’ or ‘GROUP BY’ clause. However, an end-user cannot make any modifications at the schema level in the database, thereby maintaining data privacy.

The following properties will help you while using the Amazon Redshift UPDATE command:

Syntax of Amazon Redshift UPDATE Statement

To work with the Amazon Redshift UPDATE statement, use the following syntax:

[ WITH [RECURSIVE] common_table_expression [, common_table_expression , ...] ]
        	UPDATE table_name [ [ AS ] alias ] SET column = { expression | DEFAULT } [,...]

[ FROM fromlist ]
[ WHERE condition ]

Parameters in Amazon Redshift UPDATE Statement

To use the Amazon Redshift UPDATE statement, you need to understand the following key parameters:

  • With Clause: It is an optional clause used to specify one or more common-table-expressions.
  • Alias: A temporary alternative name given to a working table is called alias. Although it is optional to consider alias for the table name, using the ‘AS’ keyword to consider alias is considered more professional.
  • table_name: Any temporary or persistent table can be given an alias name. One should note that only the owner of the table or a user with ‘UPDATE’ privilege on a table can update rows. For instance, if a ‘FROM’ clause is used in an expression, a user must have ‘SELECT’ privilege on those tables.
  • SET column: When a user wants to modify one or more columns, ‘SET’ is used to update such columns. The unspecified columns will save their current value.
  • Expression: The expression determines the new value of the specified column. By default, the column is updated with the default value that the column is assigned when using the CREATE table operator.
  • FROM tablelist: When updating a table, a user can give reference to the information present in another table. Such tables are listed either in the ‘FROM’ clause or approached using a subquery with a ‘WHERE’ condition.
  • WHERE condition: A ‘WHERE’ clause is an optional clause that restricts the update command to certain matching rows.

Usage of Amazon Redshift UPDATE Statement

Amazon Redshift allows a maximum size of 16 MB for any single SQL statement. When updating a large number of lines in the table, you must restore storage and recourse values. 

A JOIN clause is used to combine rows from two or more tables based on the related columns. Following is the usage of the UPDATE statement when combined with the JOIN clause:

  • Left, Right, and Full outer joins would return an error if updated using the FROM clause.
  • While specifying an outer join, use a subquery in the WHERE clause of an UPDATE statement.
  • If the UPDATE statement requires a self-join to a target table, a JOIN Condition must be specified with WHERE clause criteria that qualify rows for the update operation. 

Examples of Amazon Redshift UPDATE Statement

Redshift Update: Database Structure
Redshift Update: Database Structure

To understand the Amazon Redshift UPDATE statement, let’s consider a sample database called TICKIT readily available in Amazon Redshift documentation. This small database has 7 tables of which 2 are fact tables and the remaining are dimensions tables. The category table in the TICKIT database contains the following rows.

Redshift Update: Amazon Redshift Table
Redshift Update: Amazon Redshift Table

Now, you will learn the following examples of updates on the above table:

1) Table Update Using a Range of Values

To update the CATGROUP column based on a range of values in the CATID column, follow the below given Amazon Redshift UPDATE query:

Redshift Update: Update Statement Example Code
Redshift Update: Update Statement Example Code

2) Table Update Using a Current Value

To update the CATNAME and CATDESC columns based on their current CATGROUP value, follow the below given Amazon Redshift UPDATE query:

Redshift Update: Update Statement Example Code
Redshift Update: Update Statement Example Code

In the above case, if the CATDESC column does not have any defined default value, it is automatically set to a null value when the table is created.

3) Table Update Using the Result of a WHERE Clause Subquery

When you update the CATEGORY table based on the result of a subquery in the WHERE clause, follow the below given Amazon Redshift UPDATE query:

Redshift Update: Update Statement Example Code
Redshift Update: Update Statement Example Code

On successful execution of the above query, you will see the below output: 

Redshift Update: Update Statement Example Code
Redshift Update: Update Statement Example Code

4) Table Update Using the Result of a Join Condition

To update the original 11 rows in the CATEGORY table based on matching CATID rows in the EVENT table, follow the below given Amazon Redshift UPDATE query:

Redshift Update: Update Statement Example Code
Redshift Update: Update Statement Example Code

In the above query, one should note that the EVENT table is listed in the FROM clause, and the WHERE clause is used to define the join condition to the target table. Only four rows with CATID 6, 7, 8, and 9 represented in the EVENT table qualified for the update.

Redshift Update: Update Statement Example Code
Redshift Update: Update Statement Example Code

5) Table Update Using the Outer Joins in the FROM Clause

While performing an outer join to the target table with a FROM clause and an UPDATE statement, you will witness an error:

Redshift Update: Update Statement Example Code
Redshift Update: Update Statement Example Code

To use the UPDATE statement, it is recommended to use outer join syntax as a subquery. Following is the required Amazon Redshift UPDATE query:

update category set catid=100
from
(select event.catid from event left join category cat on event.catid=cat.catid) eventcat
where category.catid=eventcat.catid
and catgroup='Concerts';

Conclusion

This article introduced Amazon Redshift and discussed the various type of SQL queries. Moreover, it explained the UPDATE command that is extremely useful when working with Amazon Redshift. It also provided the examples, use and syntax for using the Amazon Redshift UPDATE statement.

When dealing with large datasets, querying commands experience an understandable lag in results. Amazon Redshift Data Warehouse clocks one of the fastest query speeds, as its massively parallel processing capabilities enable BI tools to generate insights quickly. In addition, Amazon Redshift is cost-efficient and handles huge data by providing scalable solutions for organizations that connect seamlessly with other AWS products to meet the desired needs.

Now, you may want to go one step further and perform analysis on the data stored in Amazon Redshift Data Warehouse. This will require you to transfer data from numerous sources to your Amazon Redshift account using various complex ETL processes.

Hevo Data will automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. This platform allows you to transfer data from 150+ data sources to Cloud-based Data Warehouses like Amazon Redshift, Snowflake, Google BigQuery, etc. It will provide you a hassle-free experience and make your work life much easier.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your understanding of Amazon Redshift UPDATE statement in the comments below!

mm
Freelance Technical Content Writer, Hevo Data

Amit Kulkarni specializes in freelance writing within the data industry, by creating informative and engaging content on data science by using his problem-solving and analytical thinking ability.

No Code Data Pipeline For Your Amazon Redshift Data Warehouse