Redshift PIVOT & UNPIVOT: The Ultimate How-to Guide [Practical Examples]

• January 7th, 2022

REDSHIFT PIVOT & UNPIVOT OPERATOR: THE ULTIMATE GUIDE: Cover

Redshift PIVOT and UNPIVOT are two separate manifestations of the same relationship/relational operator. In order to achieve the desired dataset the two operators — Redshift PIVOT and UNPIVOT — convert rows into columns and columns into rows.

In general, Redshift PIVOT and UNPIVOT operators are useful to manage multiple tables of datasets and transform the information present in an easy-to-understand format.

In this tutorial article, we will discuss these two concepts in detail, and with some examples show you how these operators work.

Table of Contents

  1. What is Amazon Redshift?
  2. What is Redshift PIVOT operator in Redshift?
  3. What is Redshift UNPIVOT operator in Redshift?
  4. Conclusion

What is Amazon Redshift?

Redshift PIVOT & UNPIVOT: The Ultimate How-to Guide | Amazon Redshift logo

Redshift is a petabyte-scale data warehouse solution built and designed for data scientists, data analysts, data administrators, and software developers. Its parallel processing and compression algorithm allow users to perform operations on billions of rows, reducing command execution time significantly. Redshift is perfect for analyzing large quantities of data with today’s business intelligence tools in multiple data warehouses.

Amazon Redshift is the best for round-the-clock computational needs — like NASDAQ daily reporting, automated ad-bidding, and live dashboards. Redshift’s users pay for an hourly rate — depending on the instance type and the number of nodes deployed.

Amazon Redshift architecture is based on an extensive communication channel between the client application and the data warehouse cluster. These two communicate with each other using the industry-standard JDBC and ODBC drivers for PostgreSQL. The Data warehouse cluster has leader node, compute node, and node slices — these are the core infrastructure components of Redshift’s architecture.

  • Leader Node: The leader node communicates with the client application and the compute nodes. It resolves, develops, and executes database operations necessary to obtain results for multiple complex queries.
  • Compute Node: A compute node is made up of multiple node slices. Each compute node has its CPU, memory, and attached disk storage. It performs computational work in a cluster. The user can select from options available for various node types, which can provide an optimum experience based on computation and storage needs.
  • Node Slice: A group of slices makes a node slice. A slice is a logical partition of a disk. Each slice has its allocated share of the node’s memory and disk space. The number of slices per node depends on the node’s size.
Redshift PIVOT & UNPIVOT: Redshift Architecture
Redshift Architecture

Scale your data integration effortlessly with Hevo’s Fault-Tolerant No Code Data Pipeline

As the ability of businesses to collect data explodes, data teams have a crucial role to play in fueling data-driven decisions. Yet, they struggle to consolidate the data scattered across sources into their warehouse to build a single source of truth. Broken pipelines, data quality issues, bugs and errors, and lack of control and visibility over the data flow make data integration a nightmare.

1000+ data teams rely on Hevo’s Data Pipeline Platform to integrate data from over 150+ sources in a matter of minutes. Billions of data events from sources as varied as SaaS apps, Databases, File Storage and Streaming sources can be replicated in near real-time with Hevo’s fault-tolerant architecture. What’s more – Hevo puts complete control in the hands of data teams with intuitive dashboards for pipeline monitoring, auto-schema management, custom ingestion/loading schedules. 

All of this combined with transparent pricing and 24×7 support makes us the most loved data pipeline software on review sites.

Take our 14-day free trial to experience a better way to manage data pipelines.

What is Redshift PIVOT operator in Redshift?

Redshift PIVOT operator converts rows into columns. Let’s take the example of a table that looks like this:

Student NameCourse Score
1Simran Physics 98
2AlexanderComputer Science89
3Simran Computer science 86
4AlexanderPhysics78
Table 1

And when we PIVOT it by the second column, the new table will look as follows:

Student Name Physics Computer Science 
1Simran 9886
2Alexander7889
Table 2

In the first table, two unique values for the course column are given — Physics and Computer Science. But after using Redshift PIVOT operator the unique values have been transformed into columns and the Course column into the row.

Now, let’s check out some PIVOT examples in Redshift.

First, set up a sample table (as shown below) and use it to run the subsequent queries.

CREATE TABLE part (
    partname varchar, 
    manufacturer varchar, 
    quality int, 
    price decimal(12, 2)
);

INSERT INTO part VALUES ('P1', 'M1', 2, 10.00);
INSERT INTO part VALUES ('P1', 'M2', NULL, 9.00);
INSERT INTO part VALUES ('P1', 'M3', 1, 12.00);

INSERT INTO part VALUES ('P2', 'M1', 1, 2.50);
INSERT INTO part VALUES ('P2', 'M2', 2, 3.75);
INSERT INTO part VALUES ('P2', 'M3', NULL, 1.90);

INSERT INTO part VALUES ('P3', 'M1', NULL, 7.50);
INSERT INTO part VALUES ('P3', 'M2', 1, 15.20);
INSERT INTO part VALUES ('P3', 'M3', NULL, 11.80);

Now, PIVOT on partname with an AVG aggregation on price.

SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
    AVG(price) FOR partname IN ('P1', 'P2', 'P3')
);

The result for the above-mentioned query is as follows

  p1   |  p2  |  p3   
-------+------+-------
 10.33 | 2.71 | 11.50

Note: The result following the previous PIVOT query is similar to the following GROUP BY query. But, after running the query, the rows are transformed into column names and columns.

SELECT partname, avg(price)
FROM (SELECT partname, price FROM part)
WHERE partname IN ('P1', 'P2', 'P3')
GROUP BY partname;

The following query results in output as mentioned below:

 partname |  avg  
----------+-------
 P1       | 10.33
 P2       |  2.71
 P3       | 11.50

Now, an example with manufacturer as a column.

SELECT *
FROM (SELECT quality, manufacturer FROM part) PIVOT (
    count(*) FOR quality IN (1, 2, NULL)
);

The following output for the above-mentioned query is as follows:

 manufacturer | 1 | 2 | null 
--------------+---+---+------
 M1           | 1 | 1 |    1
 M2           | 1 | 1 |    1
 M3           | 1 | 0 |    2

Note: Redshift PIVOT operator returns similar data (with the query given below), including GROP BY. The only difference is, the PIVOT operator returns 0 for column 2 and the manufacturer M3. The Group BY operator does not have a corresponding row — in most cases PIVOT inserts NULL if a row does not have input data for a given column.

SELECT manufacturer, quality, count(*)
FROM (SELECT quality, manufacturer FROM part)
WHERE quality IN (1, 2) OR quality IS NULL
GROUP BY manufacturer, quality;

The query gives out the following output:

 manufacturer | quality | count 
--------------+---------+-------
 M1           |       1 |     1
 M1           |       2 |     1
 M1           |         |     1
 M2           |       1 |     1
 M2           |       2 |     1
 M2           |         |     1
 M3           |       1 |     1
 M3           |         |     2

Redshift PIVOT operator accepts optional aliases on the aggregate expression, and for each value for the IN operator, too. It is recommended to use aliases and then customize column identity. If there is no aggregate alias, then only the IN list aliases are used. Otherwise, the aggregate alias is appended to the column name with an underscore to separate the names.

SELECT *
FROM (SELECT quality, manufacturer FROM part) PIVOT (
    count(*) AS count FOR quality IN (1 AS high, 2 AS low, NULL AS na)
);

The query gives out the following output:

 manufacturer | high_count | low_count | na_count 
--------------+------------+-----------+----------
 M1           |          1 |         1 |        1
 M2           |          1 |         1 |        1
 M3           |          1 |         0 |        2

All of the capabilities, none of the firefighting

Using manual scripts and custom code to move data into the warehouse is cumbersome. Frequent breakages, pipeline errors and lack of data flow monitoring makes scaling such a system a nightmare. Hevo’s reliable data pipeline platform enables you to set up zero-code and zero-maintenance data pipelines that just work.

  • Reliability at Scale: With Hevo, you get a world-class fault-tolerant architecture that scales with zero data loss and low latency. 
  • Monitoring and Observability: Monitor pipeline health with intuitive dashboards that reveal every stat of pipeline and data flow. Bring real-time visibility into your ELT with Alerts and Activity Logs 
  • Stay in Total Control: When automation isn’t enough, Hevo offers flexibility – data ingestion modes, ingestion, and load frequency, JSON parsing, destination workbench, custom schema management, and much more – for you to have total control.    
  • Auto-Schema Management: Correcting improper schema after the data is loaded into your warehouse is challenging. Hevo automatically maps source schema with destination warehouse so that you don’t face the pain of schema errors.
  • 24×7 Customer Support: With Hevo you get more than just a platform, you get a partner for your pipelines. Discover peace with round the clock “Live Chat” within the platform. What’s more, you get 24×7 support even during the 14-day full-feature free trial.
  • Transparent Pricing: Say goodbye to complex and hidden pricing models. Hevo’s Transparent Pricing brings complete visibility to your ELT spend. Choose a plan based on your business needs. Stay in control with spend alerts and configurable credit limits for unforeseen spikes in data flow. 

What is Redshift UNPIVOT operator in Redshift?

Redshift UNPIVOT operator converts columns into rows. Let’s take the example of a table that looks like this:

IDStudent Name MathEnglishHistoryScience
1Son 54796598
2Marie67907686
Table 1

Now when we UNPIVOT, the new table will look as follows:

Student NameCourse Score
1SonMath54
2SonEnglish79
3SonHistory65
4SonScience98
5MarieMath67
6MarieEnglish90
7MarieHistory76
8MarieScience86
Table 2

The columns of the first table have been converted into rows. The second table is how the unpivoted table looks like 

Now, let’s check out Redshift UNPIVOT examples in Redshift.

Setup and use the sample data (given below) then execute the subsequent examples.

CREATE TABLE count_by_color (quality varchar, red int, green int, blue int);

INSERT INTO count_by_color VALUES ('high', 15, 20, 7);
INSERT INTO count_by_color VALUES ('normal', 35, NULL, 40);
INSERT INTO count_by_color VALUES ('low', 10, 23, NULL);

We will continue to Redshift UNPIVOT on input columns green, blue, and red.

SELECT * 
FROM (SELECT red, green, blue FROM count_by_color) UNPIVOT (
    cnt FOR color IN (red, green, blue)
);

The following output is shown.

 color | cnt 
-------+-----
 red   |  15
 red   |  35
 red   |  10
 green |  20
 green |  23
 blue  |   7
 blue  |  40

Note: By default, the NULL values in the input column remain skipped, not yielding any result row.

Now let’s UNPIVOT with INCLUDE NULLS

SELECT * 
FROM (
    SELECT red, green, blue 
    FROM count_by_color
) UNPIVOT INCLUDE NULLS (
    cnt FOR color IN (red, green, blue)
);

The following output is shown below.

color | cnt
——-+—–
red | 15
red | 35
red | 10
green | 20
green |
green | 23
blue | 7
blue | 40
blue |

When the INCLUDING NULLS parameter is set, results generated for NULL input values are as follow. The query below shows UNPIVOT with quality as an implicit column.

SELECT * 
FROM count_by_color UNPIVOT (
    cnt FOR color IN (red, green, blue)
);

The output for the above-mentioned query is shown below:

 quality | color | cnt 
---------+-------+-----
 high    | red   |  15
 normal  | red   |  35
 low     | red   |  10
 high    | green |  20
 low     | green |  23
 high    | blue  |   7
 normal  | blue  |  40

The next query shows the UNPIVOT operator with aliases for IN list’s values. 

SELECT * 
FROM count_by_color UNPIVOT (
    cnt FOR color IN (red AS r, green AS g, blue AS b)
);

The output for the above-mentioned query is shown below:

 quality | color | cnt 
---------+-------+-----
 high    | r     |  15
 normal  | r     |  35
 low     | r     |  10
 high    | g     |  20
 low     | g     |  23
 high    | b     |   7
 normal  | b     |  40

Conclusion

Through this tutorial article, we shed a light on Amazon Redshift PIVOT and UNPIVOT operators. In theory, their use case might seem basic, but in reality, their usage leaves cascading effects on the project management landscape as a whole.

For better and informed judgment, you can also refer to either of these three articles cited below:

Furthermore, if you are a data folk who needs a one-stop solution for data management and ETL-related tasks — Hevo has got your back! Hevo Data is a no-code data pipeline platform that helps new-age businesses integrate their data from multiple sources systems to a data warehouse and plug this unified data into any BI tool or Business Application. The platform provides 150+ ready-to-use integrations with a range of data sources and is trusted by hundreds of data-driven organizations from 30+ countries.

Visit our Website to Explore Hevo
Hevo Product Video

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

No-code Data Pipeline For Redshift