Redshift is a completely managed data warehouse service offered by Amazon Web Services. It is known for its ability to scale seamlessly, support petabytes of data storage, and super-fast querying ability. Redshift is available based on a subscription payment model.

All these features make it a great choice for everyone from small and medium businesses to large enterprises.

Redshift’s querying layer is based on the PostgreSQL query standard. It supports most functions supported by PostgreSQL and organizations who are already using Redshift for their transactional or warehousing loads generally find it a natural extension of their architecture.

Introduction to Redshift

Under its hood, Redshift has a massively parallel processing architecture. Redshift clusters can be configured for different kinds of workloads as per the requirement.

Redshift Pricing is broadly affected by four factors:

  1. The node type that the customer chooses to build his cluster.
  2. The region where the cluster is deployed.
  3. Billing strategy – on-demand billing or a reserved pricing strategy.
  4. Use of Redshift Spectrum.

Redshift Date functions

  • Redshift’s date and time functions help the customers to transform or manipulate data that is stored in the Redshift table with the data type as DATE, TIME, TIMETZ, TIMESTAMP, and TIMESTAMPTZ. Before going to date and time functions, let us learn what each of these data types represents.

DATE: Used for storing date data.
TIME: Used for storing time with hours, minutes, and seconds without explicitly mentioning time zone.
TIMETZ: Used to store time with timezone.
TIMESTAMP: Used to store the date and time of the day without a time zone. 
TIMESTAMPTZ: Used to store the date and time of the day with time zone information. 

  • Now, let us move to the date and time functions that will help you manipulate the data stored in these data types.
  • To help us understand the operators better, let’s first create a table with the date data type. This table will help us use the operators and watch the results. Use the Reds
CREATE TABLE SUBSCRIPTION( product_id integer, stard_date date, end_date date);

You can then insert a few rows using the below statements.

INSERT INTO SUBSCRIPTION VALUES ( 1, ‘2020-01-12’ , ‘2023-02-12’);
INSERT INTO SUBSCRIPTION VALUES ( 2, ‘2008-01-12’ , ‘2011-02-12’);

Redshift Date functions: Concatenation (+) Operator

This operator helps one to combine date and time to return a timestamp. For example, we know that our SUBSCRIPTION table only has date information in the start_date field. To add default time information to each date, you can use the below query.

SELECT start_date + time ‘20:00:00’ as start_timestamp from SUBSCRIPTION;

The output will be as follows.

start_timestamp
———————–
‘2020-01-12 20:00:00’
‘2008-01-12 20:00:00’

The operator works in the same manner for the TIMETZ data type too.

Redshift Date functions: ADD_MONTHS function

  • This function takes a date and an integer as arguments. It returns the date after adding the number of specified months.
  • For example, if you want to find the date after 12 months from the start_date from our SUBSCRIPTION table, you can use the below query.
SELECT ADD_MONTHS(start_date,12) as next_date FROM SUBSCRIPTION;

The output will be as follows.

next_date
———————–
2021-01-12
2009-01-12

Redshift Date functions: AT_TIMEZONE function

This function converts a timestamp value from one time zone to another time zone. For example,

SELECT TIMESTAMP WITH TIME ZONE '2021-03-18 21:39:42-05' AT TIME ZONE 'MST';

In the above statement, the timezone of the selected value is UTC – 5, To convert that to MST ( Which is UTC -7 ) you can use the AT_TIMEZONE function.

The output of the statement will be as follows.

 timestamp
————————
‘2021-03-18 19:39:42’

Redshift Date functions: CONVERT_TIMEZONE function

As the name suggests, this function converts a timezone to another timezone. For example, let’s say we had a table called CUSTOMER with a last_activity timestamp field. The default timezone of Redshift is UTC. To select the last_activity value in MST timezone, you can use the CONVERT_TIMEZONE as below.

SELECT last_activity,CONVERT_TIMEZONE(last_activity,’MST’) as last_activity_mst FROM CUSTOMER;

The output will be as below.

last_activity                 | last_activity_mst
—————————- + ———————–
2021-09-24 10:36:13     2021-09-24 02:36:13

Redshift Date functions: Comparison functions

The DATE_CMP function takes two dates as arguments and compares them. It returns a 0 if the dates are the same, -1 if the first date is earlier and 1 if the first date is later. 

Using our SUBSCRIPTION table let’s execute a query to check the subscriptions that are earlier than 2010-01-01.

SELECT product_id, start_date, DATE_CMP(start_date,’2010-01-01’) as compare_status;

The output will be as follows.

product_id,start_date,compare_status
1, 2020-01-12, -1
2, 2008-01-12, 1

Similar to DATE_CMP, there is DATE_CMP_TIMESTAMP and DATE_CMP_TIMESTAMPTZ that can compare a date with a timestamp or a timestamp with timezone respectively.

Similarly, there is the TIMESTAMP_CMP function that takes two timestamp values as arguments and compares them. To compare a timestamp with date or timestamp with timezones, Redshift provides TIMESTAMP_CMP_DATE and TIMSTAMP_CMP_TIMESTAMPTZ functions.

To compare two timestamps with timezones, there is the TIMESTAMPTZ_CMP function. If you want to compare a timestamp with a timezone to a timestamp or a date, you can use TIMETSTAMPTZ_CMP_TIMESTAMP and TIMESTAMPTZ_CMP_DATE functions.

Finally, there is an INTERVAL_CMP function that can compare two intervals and return a 0, 1, or -1 based on the comparison. For example, consider the below statement.

SELECT INTERVAL_CMP(‘5 days’,’2 year’);

The above statement will return a -1 as the output. 

Redshift Date functions: PART Functions

Redshift provides two functions to extract part of the information from date values. The DATE_PART_YEAR function returns the year part of the date value. DATE_PART function is a generic function that allows you to get a specific part of the date value based on the first argument. For example, to get the year information from a date value, you can use the below statement.

SELECT DATE_PART(year,start_date) as year, start_date from SUBSCRIPTION

The output will be as follows.

year,start_date
2020,2020-01-12
2008,,2008-01-12

The DATE_TRUNC function operates on timestamp data type and provides the first day of the specified year, the first day of the specified month, or Monday of the specified week depending on the literal that was specified as the first argument. For example, to get the Monday of the specified week, you can use the argument as ‘week’

There is another function called EXTRACT that can be used similarly to DATE_PART except that it works for all kinds of data types like TIMESTAMP, TIMESTAMPTZ, TIME, etc. 

Redshift Date functions: DATE_ADD function

  • This function takes three arguments. The first argument is a literal value that specifies whether to add weeks, months, years, days, or any other unit of time.
  • The other two arguments are the quantity to be added and the actual date or timestamp value to which addition has to be performed.
  • For example, if you want to add 18 months to the start_date column in our SUBSCRIPTION table, the statement will look like below.
SELECT start_date, DATE_ADD(months,12,start_date) as next_date FROM SUBSCRIPTIONS;

The output of the above statement will be as follows.

start_date,next_date
2020-01-12, 2021-07-12
2008-01-12,,2009-07-12

Redshift Date functions: DATE_DIFF function

  • This function takes two date values and returns the difference between them in the unit specified by the ‘date part’ argument.
  • For example, to find the difference between two dates in weeks, use the below statement.
SELECT DATEDIFF(week,'2019-01-01','2019-12-31') as numweeks;

The output will be the number of weeks between the two dates.

numweeks
——————
52

There is also a specific MONTHS_BETWEEN function that returns the months between the two date values.

Redshift Date functions: NEXT_DAY function

This function can be used to get the date of the next week of the day after a specific date. For example, if you want to get the date of next Monday after a specific date, you can use the statement below.

SELECT NEXT_DAY('2024-08-20','Friday');

Redshift Date functions: LAST_DATE function

This function returns the last day of the month on a date. An example could be a statement like the below one.

SELECT LAST_DAY(‘2020-08-20’);

The output will be 2020-08-31 which is the last day of the 8th month. 

Redshift Date functions: Current date functions

Redshift provides two functions to get the current date. The SYSDATE function returns the current date and time in the current session timestamp. If placed inside a transaction, it returns the start of the transaction and not the actual statement execution time.

The GETDATE function also returns the current date and time. When placed inside a transaction, it returns the execution time of the statement rather than the start of the transaction.

Conclusion

  1. While scalability and performance are a must for a data warehouse, it is the richness of the querying engine that ultimately decides its worth for the analysts.
  2. Redshift comes with a rich set of functions and operators that make it easy for analysts to handle all kinds of queries supported by traditional databases.
Talha
Software Developer, Hevo Data

Talha is a Software Developer with over eight years of experience in the field. He is currently driving advancements in data integration at Hevo Data, where he has been instrumental in shaping a cutting-edge data integration platform for the past four years. Prior to this, he spent 4 years at Flipkart, where he played a key role in projects related to their data integration capabilities. Talha loves to explain complex information related to data engineering to his peers through writing. He has written many blogs related to data integration, data management aspects, and key challenges data practitioners face.

All your customer data in one place.