Redshift generate_series() Functions Made Easy

|

Redshift Generate_Series Feature Image

Several charts, tables, and dashboards are created using Time Series. Usually, they are easier to query when you can join and aggregate based on a list of dates. Despite this, you may be surprised to learn that Amazon Redshift did not support the Generate Series function before PostgreSQL 8.4. There are no references to the series in Amazon Redshift’s official documentation, but there is a workaround.

The Redshift generate_series() function can be used in Amazon Redshift, though it is still available with some limitations. Before we start using Redshift generate_series(), let’s discuss Amazon Redshift, how it works, and its features.

Table of Contents

What is Amazon Redshift?

Redshift generate_series: Redshift
Image Source: www.blazeclan.com

Amazon’s Data Warehouse solution supports Business Intelligence in the Cloud. Redshift is Amazon’s Data Warehouse hosted on Amazon Web Services. Amazon Redshift is the most commonly used cloud data warehouse, with tens of thousands of users using it to analyze exabytes of data and conduct complicated analytical queries. Without having to manage your data warehouse architecture, you can run and scale analytics on all of your data in seconds. Customers can query Petabytes of structured and semi-structured data via Redshift’s SQL-based query language.

Amazon Web Services offers Redshift Data Warehouses for as low as $0.25 per hour, so customers can build a data warehouse that fits their business needs.

Do You Know How Amazon Redshift Works?

Redshift generate_series: Architecture
Image Source: www.docs.aws.amazon.com

Depending upon your needs, Redshift can either be deployed as an individual 160GB node or as a multi-node clustered solution with a ‘Leader’ node handling client connections and receiving queries, which is connected to up to 128 Compute Nodes that actually store and process data.

Redshift uses advanced compression technology to compress individual Database Columns significantly compared to traditional Relational Databases. As a result, data stored in Redshift uses less storage space than data stored in competing technologies.

In Redshift, workloads and data are automatically distributed across all compute nodes through Massive Parallel Processing (MPP) technology, allowing complex queries against large datasets to be resolved quickly and efficiently.

Simplify Amazon Redshift Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up data integration from 100+ Data Sources (including 30+ Free Data Sources) and will let you directly load data to a Data Warehouse like Amazon Redshift or a destination of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data.

Get started with hevo for free

Let’s look at some of the salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Scalable Infrastructure: Hevo has in-built integrations for 100’s of sources that can help you scale your data infrastructure as required.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-day free trial!

Amazon Redshift Features

Below are the features responsible for the immense popularity of Amazon Redshift.

Scalability 

The AWS Redshift Data Warehouse can be scaled in various ways to meet growing demands from Data Scientists. The first feature is that AWS Redshift Data Spectrum can perform queries on data stored in Amazon S3 Data Lakes without requiring the data to be loaded into Redshift. Therefore, Data Scientists will not need to wait for ETL jobs to be complete to perform queries.

The most crucial benefit of Redshift is that it can be scaled in a matter of hours, unlike traditional on-premises Data Warehouse solutions that may take days to scale.

The capacity of a Redshift Data Warehouse can also be rapidly increased by deploying a new one from a snapshot. The Redshift Elastic Resize feature allows you to add or remove nodes in minutes, taking advantage of peaks and troughs in demand to let your cluster grow and shrink as needed.

Fault-Tolerant

Data Warehouse clusters can be made more reliable by using multiple features. Using Amazon Redshift as an example, the service keeps track of cluster health, re-replicates data from failed disks automatically, and replaces nodes necessary for fault tolerance. 

Clusters can also be moved to an alternative Availability Zone (AZ) without losing data or requiring any changes to the application.

Security

Redshift keeps three copies of your data in a Redshift Data Warehouse. It can own one day’s worth of backups by default, but this can be extended up to 35 days. In addition to replicating data to Amazon S3 in another AWS region, Redshift can also make disaster recovery asynchronously.

Data at rest and data in transit in Redshift Data Warehouses is encrypted. Encryption of data at rest is handled by the AES-256 algorithm, while encryption of data in transit is dealt with by SSL. Redshift manages encryption keys by default, but customers can also manage their keys using the AWS Key Management Service (KMS) or a Hardware Security Module (HSM).

Automated Table Design

To reduce the time it takes for users to execute queries, Amazon Redshift uses sophisticated algorithms to monitor their workloads and find ways to optimize the physical layout of data. For optimal performance of the cluster, Automatic Table Optimization selects the most efficient sorting and distribution keys. 

Tables will be automatically altered without administrator intervention when Amazon Redshift determines that using a key enhances cluster performance. In addition to Automatic Vacuum Delete, Automatic Table Sort, and Automatic Analyze, there is no need to manually maintain and tune Redshift clusters to obtain maximum performance for new clusters and production workloads.

Resilient 

Currently, Redshift can be deployed as a single availability zone. Redshift snapshots, however, can be deployed into a different availability zone to swiftly restore service after an outage.

Flexible Pricing

Redshift generate_series: Pricing
Image Source: www.hevodata.com

Redshift is billed based on usage, just like all AWS services. It charges an hourly rate for compute nodes, data backups, and data transfers. There are no fees associated with the Leader node in a Redshift cluster.

In contrast to other Cloud Data Warehouses, Amazon Redshift is at least 50% cheaper than these different solutions, with 1TB of data costing only $1,000 per year.

Serverless Technology

A serverless option of Amazon Redshift enables analytics to be run without the need to set up and maintain Data Warehouses in seconds with Amazon Redshift Serverless (preview). Anyone can gain insights from data by merely loading and querying data using Redshift Serverless — from data analysts to developers to business professionals to data scientists.

Redshift generate_series() Function

The Redshift generate_series() function generates integer, time series, etc. The following examples will help you understand Redshift generate_series() better.

Syntax 

Amazon Redshift does not have documentation on how to generate series functions. Nevertheless, below is the syntax for the PostgreSQL Generate Series function; it will work on Redshift.

generate_series(start, stop, step)

This function generates a series of values, starting at start and stopping at a stop, with a step size.

Redshift generate_series() Function Example

The following is an example of how to generate integer series. You can start by developing sequential integer series beginning at 1.

dev=# select generate_series(1,5);
generate_series
-----------------
                1
                2
                3
4
5
(5 rows)

Generating Series in Reverse Order

A series can also be generated in reverse order. In the example below, the values are generated in reverse order.

select * from generate_series(10, 1, -1);
generate_series
-----------------
10
                9
8
7
6
5
                4
                3
                2
                1
(10 rows)

Generating Date Series in Redshift

You can create a series of ten days in dashboards and KPIs in the example below.

SELECT CURRENT_DATE::TIMESTAMP  - (i * interval '1 day') as date_datetime 
FROM generate_series(1,10) i 
ORDER BY 1
    date_datetime
---------------------
 2019-09-27 00:00:00
 2019-09-28 00:00:00
 2019-09-29 00:00:00
 2019-09-30 00:00:00
 2019-10-01 00:00:00
 2019-10-02 00:00:00
 2019-10-03 00:00:00
 2019-10-04 00:00:00
 2019-10-05 00:00:00
 2019-10-06 00:00:00
(10 rows)

Creating a Series Function with an INSERT Statement

As far as inserting rows to a table is concerned, you can’t use this function with the Redshift insert command, as it is a Leader-Node-Only function, while INSERTs work on the Compute Nodes. This function is not known to the compute node.

Take a look at the following insert command, for example.

dev=# insert into test select * from(select generate_series(1,3) as col1) as a;
INFO:  Function "generate_series(integer,integer)" not supported.
ERROR:  Specified types or functions (one per INFO message) not supported on Redshift tables.

The example above shows that compute nodes does not recognize the generate_series() function. Thus, Redshift generate_series() cannot be combined with compute nodes.

Conclusion

Redshift is Amazon’s Data Warehouse hosted on Amazon Web Services, and customers can query Petabytes of structured and semi-structured data via Redshift’s SQL-based query language. The Redshift generate_series() function generates integer, time series, etc. This article introduced you to Amazon Redshift and helped you understand the Redshift generate_series(). However, extracting data from a diverse set of sources and loading it into a Data Warehouse like Amazon Redshift can be a tedious and daunting task. This is where Hevo saves the day for you.

visit our website to explore hevo

Hevo Data with its strong integration with 100+ Sources & BI tools such as Amazon Redshift, allows you to not only export data from sources & load data in the destinations, but also transform & enrich your data, & make it analysis-ready so that you can focus only on your key business needs and perform insightful analysis using BI tools.

Give Hevo Data a try and sign up for a 14-day free trial today. Hevo offers plans & pricing for different use cases and business needs, check them out!

Share your experience of working with Redshift generate_series() in the comments section below.

Samuel Salimon
Freelance Technical Content Writer, Hevo Data

Samuel specializes in freelance writing within the data industry, adeptly crafting informative and engaging content centered on data science by merging his problem-solving skills.

No-code Data Pipeline for Amazon Redshift