Fundamentals of Redshift Functions: Python UDF & SQL UDF with 2 Examples

on Amazon Redshift, Data Analytics, Data Warehouse, Data Warehouses, Database Management Systems, Python, SQL, Tutorials • September 30th, 2021 • Write for Hevo

The simple and cost-effective Data Warehousing Solution, Amazon Redshift offers a collection of information for competitive and comparative analysis. With Columnar Storage Technology to improve I/O efficiency and providing parallelized queries across multiple nodes, Amazon Redshift offers the feature of Redshift Functions for users that that be customised.

Known as User Defined Functions (UDF), these Redshift Functions can be executed similarly to built-in other Redshift Functions such as CURRENT_SCHEMA or Date functions like CURRENT_TIMESTAMP.

In this article, we will be discussing the Redshift Functions: Scalar Python UDF and Scalar SQL UDF with examples in detail. We’ll also discuss a polymorphic data type ANYELEMENT which can be used in these Redshift Functions.

Here’s the outline of the article:

Table of Contents

Introduction to Amazon Redshift

Redshift Logo: Redshift Functions
Image Source: Nightingale HQ

Amazon Redshift is a Cloud-based, fully managed Petabyte-Scale Data Warehousing Service. Starting with a few hundred gigabytes of data, you may scale up to a petabyte or more. This allows you to gain fresh insights for your Company and Customers by analysing your data.

The first step in creating an Amazon Redshift Data Warehouse is to set up an Amazon Redshift Cluster, which is a collection of Machines. You can upload your Data Set and then run Data Analysis queries after you’ve provisioned your Cluster. Using the same SQL-based Tools and Business Intelligence Apps you’re already using, Amazon Redshift enables Rapid Query Performance regardless of the size of the Data Set.

Redshift Functions: Creating User-Defined Functions

A SQL SELECT clause or a Python Programme can be used to build a Custom Scalar User-Defined Function (UDF). The new function is saved in the Database and can be used by any user with enough credentials. Running a new Scalar UDF is similar to running current Amazon Redshift Functions.

You can load your own Custom Python Modules into Python UDFs in addition to using Conventional Python functionality.

You may also use AWS Lambda UDFs in your SQL queries to employ Custom Functions defined in Lambda. You can use Lambda UDFs to create Complicated UDFs and integrate them with third-party components. They can also assist you in overcoming some of the current Python and SQL UDF constraints. 

They can, for example, assist you in gaining access to network and storage resources as well as writing more complex SQL statements. Any of Lambda’s Supported Programming Languages, including Java, Go, PowerShell, Node.js, C#, Python, and Ruby, can be used to create Lambda UDFs. You can even make your own runtime from scratch.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Salesforce, Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ data sources (including 30+ free data sources) and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination like Amazon Redshift but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Get Started with Hevo for Free

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Redshift Functions: Creating a Scalar Python UDF

When a Scalar Python UDF is called, it contains a Python Programme that executes and returns a single value. The Redshift function used here is CREATE FUNCTION, which creates a new scalar User-Defined Function (UDF). 

Syntax for Redshift function CREATE FUNCTION is given as:

CREATE [ OR REPLACE ] FUNCTION f_function_name 
( { [py_arg_name py_arg_data_type |
sql_arg_data_type } [ , ... ] ] )
RETURNS data_type
{ VOLATILE | STABLE | IMMUTABLE }   
AS $$
  { python_program | SELECT_clause }
$$ LANGUAGE { plpythonu | sql }

which defines the following parameters:

  • (Optional) Arguments, which must be entered. A Name and a data type must be assigned to each argument.
  • One return data type.
  • One Python Programme that can be run.

SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE PRECISION, BOOLEAN, CHAR, VARCHAR, DATE, or TIMESTAMP can be used as input and return data types. Additionally, Python UDFs can employ the data type ANYELEMENT, which Amazon Redshift translates to a standard data type at runtime based on the inputs supplied.

The following stages occur at runtime when an Amazon Redshift query calls a Scalar UDF:

  1. The supplied arguments are converted to Python data types using this Redshift function.
  2. The translated Input Arguments are passed to the Python Programme, which is then run.
  3. A single value is returned by the Python code. The return value’s data type must match the RETURNS data type given in the function specification.
  4. The function changes the Python return value to the Amazon Redshift data type chosen, then returns it to the query.

Redshift Functions: Scalar Python UDF Example

The following example creates a Redshift Function that compares two numbers and returns the larger value. The code must be indented between the double dollar signs ($$), which is a Python requirement. 

create function f_py_greater (a float, b float)
  returns float
stable
as $$
  if a > b:
    return a
  return b
$$ language plpythonu;

Source:https://docs.aws.amazon.com/redshift/latest/dg/udf-creating-a-scalar-udf.html

The following query calls the new f_greater function to query the SALES table and return either COMMISSION or 20 percent of PRICEPAID, whichever is greater.

select f_py_greater (commission, pricepaid*0.20) from sales;

Source:https://docs.aws.amazon.com/redshift/latest/dg/udf-creating-a-scalar-udf.html

ANYELEMENT Data Type

A polymorphic data type is ANYELEMENT. When Redshift functions are declared with ANYELEMENT as the data type for an argument, the Redshift functions can accept any standard Amazon Redshift data type as input for that argument when it is called. When the function is called, the ANYELEMENT argument is set to the data type that was actually supplied to it.

When Redshift functions with several ANYELEMENT data types are called, they must all resolve to the same real data type. The data type of any ANYELEMENT argument is set to the data type of the first argument supplied to it. Redshift functions specified as f_equal(anyelement, anyelement), for example, will accept any two input values as long as they are of the same data type.

If a Redshift function’s return value is ANYELEMENT, at least one input argument must also be ANYELEMENT. The return value’s data type is the same as the ANYELEMENT input argument’s data type.

Redshift Functions: Creating a Scalar SQL UDF

When a Scalar SQL UDF is invoked, it includes a SQL SELECT clause that executes and returns a single value. The parameters for the CREATE FUNCTION command are as follows:

  • (Optional) Arguments that must be entered. A data type must be assigned to each argument.
  • There is only one return data type.
  • There is only one SELECT clause in SQL. Refer to the input arguments in the SELECT clause with $1, $2, and so on, in the order of the arguments in the function specification.

Any common Amazon Redshift data type can be used as the input and return data types.

Redshift Scalar SQL Function: Redshift Functions
Image Source: Trevor Fox

Note: In your SELECT clause, don’t include a FROM clause. Instead, in the SQL statement that calls the SQL UDF, include the FROM clause.

Any of the clauses listed below cannot be used in the SELECT clause:

  • WHERE
  • GROUP BY
  • ORDER BY 
  • LIMIT
  • FROM
  • INTO

Redshift Functions: Scalar SQL UDF Example

The example below shows how to write a function that compares two numbers and returns the greater of the two. 

create function f_sql_greater (float, float)
  returns float
stable
as $$
  select case when $1 > $2 then $1
    else $2
  end
$$ language sql; 

Source: https://docs.aws.amazon.com/redshift/latest/dg/udf-creating-a-scalar-sql-udf.html

The query below uses the new f_sql_greater function to query the SALES table and return either COMMISSION or 20 percent of PRICEPAID, whichever is greater.

select f_sql_greater(commission, pricepaid*0.20) from sales;

Source: https://docs.aws.amazon.com/redshift/latest/dg/udf-creating-a-scalar-sql-udf.html

Conclusion

This article aimed to present the readers with information on Redshift functions, primarily the Scalar Python UDF and Scalar SQL UDF with insights into polymorphic data type ANYELEMENT and examples to improve the reader’s understanding regarding the same.

While using AWS Redshift Services are insightful, it is a hectic task to set up and manage the proper environment on a regular basis. Extracting and integrating several heterogeneous sources into your Data Warehouse like Amazon Redshift is also a big task. To make things easier, Hevo comes to your rescue. Hevo Data is a No-code Data Pipeline and has awesome 100+ pre-built Integrations that you can choose from.

Visit our Website to Explore Hevo

Hevo can help you integrate your data from numerous sources and load them into destinations like Redshift to Analyze real-time data with a BI tool such as Tableau. It will make your life easier and data migration hassle-free. It is user-friendly, reliable, and secure.

Sign Up for a 14-day free trial and see the difference!

Share your experience of learning about the Redshift Functions in the comments section below.

No-code Data Pipeline for Amazon Redshift