The simple and cost-effective Data Warehousing Solution, Amazon Redshift offers a collection of information for competitive and comparative analysis. With Columnar Storage Technology to improve I/O efficiency and providing parallelized queries across multiple nodes, Amazon Redshift offers the feature of Redshift Functions for users that that be customised.

Known as User Defined Functions (UDF), these Redshift Functions can be executed similarly to built-in other Redshift Functions such as CURRENT_SCHEMA or Date functions like CURRENT_TIMESTAMP.

In this article, we will be discussing the Redshift Functions: Scalar Python UDF and Scalar SQL UDF with examples in detail. We’ll also discuss a polymorphic data type ANYELEMENT which can be used in these Redshift Functions.

Introduction to Amazon Redshift

Amazon Redshift is a Cloud-based, fully managed Petabyte-Scale Data Warehousing Service. Starting with a few hundred gigabytes of data, you may scale up to a petabyte or more. This allows you to gain fresh insights for your Company and Customers by analysing your data.

The first step in creating an Amazon Redshift Data Warehouse is to set up an Amazon Redshift Cluster, which is a collection of Machines. You can upload your Data Set and then run Data Analysis queries after you’ve provisioned your Cluster. Using the same SQL-based Tools and Business Intelligence Apps you’re already using, Amazon Redshift enables Rapid Query Performance regardless of the size of the Data Set.

Integrate Salesforce to Redshift
Integrate Mailchimp to Redshift
Integrate Google Ads to Redshift

Creating User-Defined Functions

A SQL SELECT clause or a Python Programme can be used to build a Custom Scalar User-Defined Function (UDF). The new function is saved in the Database and can be used by any user with enough credentials. Running a new Scalar UDF is similar to running current Amazon Redshift Functions.

You can load your own Custom Python Modules into Python UDFs in addition to using Conventional Python functionality.

You may also use AWS Lambda UDFs in your SQL queries to employ Custom Functions defined in Lambda. You can use Lambda UDFs to create Complicated UDFs and integrate them with third-party components. They can also assist you in overcoming some of the current Python and SQL UDF constraints. 

They can, for example, assist you in gaining access to network and storage resources as well as writing more complex SQL statements. Any of Lambda’s Supported Programming Languages, including Java, Go, PowerShell, Node.js, C#, Python, and Ruby, can be used to create Lambda UDFs. You can even make your own runtime from scratch.

Move Analytics-Ready Data to Amazon Redshift with Hevo

Hevo is a no-code data pipeline platform that not only loads data into your desired destination, like Amazon Redshift, but also enriches and transforms it into analysis-ready form without writing a single line of code.

Why Hevo is the Best:

  • Minimal Learning Curve: Hevo’s simple, interactive UI makes it easy for new users to get started and perform operations.
  • Connectors: With over 150 connectors, Hevo allows you to integrate various data sources into your preferred destination seamlessly.
  • Schema Management: Hevo eliminates the tedious task of schema management by automatically detecting and mapping incoming data to the destination schema.
  • Live Support: The Hevo team is available 24/7, offering exceptional support through chat, email, and calls.
  • Cost-Effective Pricing: Transparent pricing with no hidden fees, helping you budget effectively while scaling your data integration needs.

Try Hevo today and experience seamless data transformation and migration.

Sign up here for a 14-Day Free Trial!

Creating a Scalar Python UDF

When a Scalar Python UDF is called, it contains a Python Programme that executes and returns a single value. The Redshift function used here is CREATE FUNCTION, which creates a new scalar User-Defined Function (UDF). 

Syntax for Redshift function CREATE FUNCTION is given as:

CREATE [ OR REPLACE ] FUNCTION f_function_name 
( { [py_arg_name py_arg_data_type |
sql_arg_data_type } [ , ... ] ] )
RETURNS data_type
{ VOLATILE | STABLE | IMMUTABLE }   
AS $$
  { python_program | SELECT_clause }
$$ LANGUAGE { plpythonu | sql }

which defines the following parameters:

  • (Optional) Arguments, which must be entered. A Name and a data type must be assigned to each argument.
  • One return data type.
  • One Python Programme that can be run.

SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE PRECISION, BOOLEAN, CHAR, VARCHAR, DATE, or TIMESTAMP can be used as input and return data types. Additionally, Python UDFs can employ the data type ANYELEMENT, which Amazon Redshift translates to a standard data type at runtime based on the inputs supplied.

The following stages occur at runtime when an Amazon Redshift query calls a Scalar UDF:

  1. The supplied arguments are converted to Python data types using this Redshift function.
  2. The translated Input Arguments are passed to the Python Programme, which is then run.
  3. A single value is returned by the Python code. The return value’s data type must match the RETURNS data type given in the function specification.
  4. The function changes the Python return value to the Amazon Redshift data type chosen, then returns it to the query.

Redshift Functions: Scalar Python UDF Example

The following example creates a Redshift Function that compares two numbers and returns the larger value. The code must be indented between the double dollar signs ($$), which is a Python requirement. 

create function f_py_greater (a float, b float)
  returns float
stable
as $$
  if a > b:
    return a
  return b
$$ language plpythonu;

The following query calls the new f_greater function to query the SALES table and return either COMMISSION or 20 percent of PRICEPAID, whichever is greater.

select f_py_greater (commission, pricepaid*0.20) from sales;

Source

ANYELEMENT Data Type

A polymorphic data type is ANYELEMENT. When Redshift functions are declared with ANYELEMENT as the data type for an argument, the Redshift functions can accept any standard Amazon Redshift data type as input for that argument when it is called. When the function is called, the ANYELEMENT argument is set to the data type that was actually supplied to it.

When Redshift functions with several ANYELEMENT data types are called, they must all resolve to the same real data type. The data type of any ANYELEMENT argument is set to the data type of the first argument supplied to it. Redshift functions specified as f_equal(anyelement, anyelement), for example, will accept any two input values as long as they are of the same data type.

If a Redshift function’s return value is ANYELEMENT, at least one input argument must also be ANYELEMENT. The return value’s data type is the same as the ANYELEMENT input argument’s data type.

Creating a Scalar SQL UDF

When a Scalar SQL UDF is invoked, it includes a SQL SELECT clause that executes and returns a single value. The parameters for the CREATE FUNCTION command are as follows:

  • (Optional) Arguments that must be entered. A data type must be assigned to each argument.
  • There is only one return data type.
  • There is only one SELECT clause in SQL. Refer to the input arguments in the SELECT clause with $1, $2, and so on, in the order of the arguments in the function specification.

Any common Amazon Redshift data type can be used as the input and return data types.

Redshift UDF example

Note: In your SELECT clause, don’t include a FROM clause. Instead, in the SQL statement that calls the SQL UDF, include the FROM clause.

Any of the clauses listed below cannot be used in the SELECT clause:

  • WHERE
  • GROUP BY
  • ORDER BY 
  • LIMIT
  • FROM
  • INTO

Scalar SQL UDF Example

The example below shows how to write a function that compares two numbers and returns the greater of the two. 

create function f_sql_greater (float, float)
  returns float
stable
as $$
  select case when $1 > $2 then $1
    else $2
  end
$$ language sql; 

The query below uses the new f_sql_greater function to query the SALES table and return either COMMISSION or 20 percent of PRICEPAID, whichever is greater.

select f_sql_greater(commission, pricepaid*0.20) from sales;

Source

Conclusion

This article aimed to present the readers with information on Redshift functions, primarily the Scalar Python UDF and Scalar SQL UDF with insights into polymorphic data type ANYELEMENT and examples to improve the reader’s understanding regarding the same.

While using AWS Redshift Services is insightful, setting up and managing the proper environment on a regular basis can be hectic. Extracting and integrating several heterogeneous sources into your Data Warehouse, like Amazon Redshift, is also a big task. To make things easier, Hevo comes to your rescue. Hevo Data is a No-code Data Pipeline and has awesome 150+ pre-built Integrations that you can choose from. Sign up for Hevo’s 14-day free trial and experience seamless data migration.

FAQs

Does Redshift have functions?

Yes, Amazon Redshift supports user-defined functions (UDFs). These functions can be written in SQL or Python (with the PL/Python extension).

Can I create a function in Redshift?

Yes, you can create functions in Amazon Redshift using SQL or Python. SQL functions can be created with the CREATE FUNCTION statement, and Python functions can be created using the CREATE FUNCTION statement with Python code.

What does AWS Redshift do?

Amazon Redshift is a fully managed, scalable data warehouse service that enables you to run complex queries and perform analytics on large datasets. It uses columnar storage and parallel query execution to efficiently handle large-scale data processing.

Muhammad Faraz
Technical Content Writer, Hevo Data

Muhammad Faraz is an AI/ML and MLOps expert with extensive experience in cloud platforms and new technologies. With a Master's degree in Data Science, he excels in data science, machine learning, DevOps, and tech management. As an AI/ML and tech project manager, he leads projects in machine learning and IoT, contributing extensively researched technical content to solve complex problems.