The purpose of this tutorial is to show you how to run Federated Query Redshift. You may already know how to run queries on Amazon Redshift by loading your data into a Redshift cluster, but querying several databases without first moving them into a Redshift environment might be strange to you. Keep reading to learn how to run Federated Query Redshift. 

Federated Query Redshift allows you to connect data from external databases to data within Amazon clusters so that you can conduct an integrated query of the databases. As simple as this may sound, Federated Query Redshift can seem complicated for the new user. This article will help you simplify the usage process. 

Table of Contents

Prerequisites

You can only run Federated Query Redshift if you’ve done the following:

  • You must have installed Amazon Redshift
  • You must know how to create Amazon Redshift clusters
  • You must use Amazon RDS and Amazon Aurora instances with registered usernames and passwords.
  • Your Amazon Redshift, Amazon RDS, and Amazon Aurora must be within the same VPC (Virtual Private Cloud).
  • Your Amazon Redshift cluster must support PostgreSQL, MySQL, Aurora, and Amazon RDS.
  • You must have access to an external data catalog like Amazon Athena or Amazon Glue.

Introduction to Amazon Redshift

Amazon Redshift Logo
Image Source

If you haven’t used Amazon Redshift in a while, it might help to recap the basics. Amazon Redshift is Amazon’s data warehouse where users can store and query their data. The results you receive from these queries can provide detailed insights into your business. 

Amazon Redshift stores data in the form of clusters. So, before you load data into the database, you must create a cluster first. Rather than developing an entirely new interface, Amazon Redshift imitates the interface of MySQL. It also retains important MySQL tools like JDBC and PostgreSQL. This makes it very4easy to use. 

Another essential feature of this service is that it can connect to business intelligence tools and integrate databases from external sources. The Amazon Redshift feature that supports this function is Federated Query.

Key Features of Amazon Redshift

A few features of Amazon Redshift are listed below:

  • Redshift ML: Amazon Redshift comes with Redshift ML that allows Data Scientists, Data Analysts, Developers, and Business Professionals to create, test and deploy models using SQL language.
  • Concurrency: Amazon Redshift allows users to run thousands of concurrent queries without affecting the performance. The computation capacity is increased automatically as the workload increases.
  • Massively Parallel Processing: Amazon Redshift makes it easier for users to run multiple queries in different clusters and distributes workload on other processors.

To learn more about Amazon Redshift, click here.

Introduction to Federated Queries

Federated query is an Amazon Redshift feature that lets you make queries over integrated databases. With Federated Query Redshift, you can link live data from external databases within Amazon S3 environments with Amazon Redshift clusters, and perform unified queries with them.

Federated Query Redshift is compatible with Amazon Aurora PostgreSQL and MySQL, as well as Amazon RDS (Relational Database System) PostgreSQL and MySQL. 

Federated Query Redshift
Image Source

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ data sources and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code. 

Get Started with Hevo for Free

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensures that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  1. Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  2. Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  3. Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  4. Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  5. Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  6. Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, E-Mail, and support calls.
  7. Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Benefits of Federated Query Redshift

  • You can easily run integrated queries
  • Federated Query Redshift also allows you to input data into your Redshift clusters without using the ETL (Extract, Transform, Load) function.

Working Principle of Federated Query Redshift

New for Amazon Redshift – Data Lake Export and Federated Query Redshift
Image Source

When you activate Federated Query Redshift on your account, the system first connects to an Amazon RDS or Amazon Aurora instance and collects metadata from the tables in the instance. 

Next, it loads the metadata into a compute node. Using a predicate pushed down, Amazon Redshift then issues subqueries with the compute node, and accumulates the result rows from the queries. 

Finally, Redshift distributes the result rows across all the compute nodes within the active leader node for quick processing. 

After you’ve activated Federated Query Redshift, any queries sent to the linked Aurora and RDS instances will be logged in SVL_Federated _Queries.

Setting up Amazon Redshift Federated Queries

Before you start running Federated Queries, you need to connect your Amazon Redshift account to Amazon Aurora PostgreSQL, Amazon RDS PostgreSQL, Amazon Aurora MySQL, and Amazon RDS MySQL. 

Follow the steps below to link  your Amazon Redshift account to Amazon RDS and Aurora:

Step 1: Creating a secret in AWS Secrets Manager for Aurora PostgreSQL and MySQL, RDS PostgreSQL and MySQL

Do the following to set up a secret in AWS Secret Manager:

  • Login to AWS Secrets Manager Console with the username on your RDS or Aurora instance.
  • Select ‘Store a New Secret’.
  • Next, click on ‘Credentials for database’. The system will ask for your username, password and encryption key to secure the secret. Input your Aurora username and password. 
  • Then, select an encryption key. AWS Secrets Manager will give you a Default Encryption key. It’s best to use the default encryption key. 
  • Create a name for the Secret.
  • Leave any field the system shows you in default mode. Now, click ‘Store’ at the end of the page.
  • Check out your new secret and study the Secret ARN (Amazon Resource Name) value assigned to your Secret. You can use the ARN to gain access to the new Secret.

Step 2: Set Up a Security Policy

Log in to the AWS Management Console, and select the IAM Identity and Management) Console. Alternatively, you can move directly to the IAM zone with this link: https://console.aws.amazon.com/iam/

  • Create a JSON policy with the following syntax:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AccessSecret",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetResourcePolicy",
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret",
                "secretsmanager:ListSecretVersionIds"
            ],
            "Resource": "arn:aws:secretsmanager:us-west-2:123456789012:secret:my-rds-secret-VNenFy"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetRandomPassword",
                "secretsmanager:ListSecrets"
            ],
            "Resource": "*"
        }
    ]
}
  • Provide a name for the new policy
  • Browse the IAM Console to find IAM Roles
  • Set up a new IAM role for your Amazon Redshift account and link it to the IAM policy.
  • To finalize the setup process, go to the Trust tab on the new IAM Role and check that it contains the trust entity: redshift.amazonaws.com.
  • Just like Amazon Secrets assigned you a Secret ARN, the IAM console will also give you a Role ARN.
  • The Role ARN offers access to both the connected Secret and the IAM Role. 

Step 3: Connect Your IAM Role to Your Amazon Redshift Cluster

  • Log in to the Amazon Redshift Console through the link:   https://console.aws.amazon.com/redshift/
  • Go to the navigation menu and select ‘Clusters’. You’ll find all the clusters that you’ve ever created in your AWS account.
  • Choose the cluster you want to work on.
  • Then, click on the ‘Actions’ tab and select ‘Manage IAM Roles’. The system will direct you to the Manage IAM Roles Page. 
  • Add the new IAM Role you’ve just created to the cluster.

Step 4: Link Specific Aurora and RDS Databases to Amazon Redshift 

You have successfully linked your Amazon Aurora and RDS accounts to your Amazon Redshift. Now, it’s time to connect the actual databases in the instances to your Amazon Redshift account. You can link these external databases to your Redshift cluster with an external schema.

To create an external schema for your Federated Queries, you must enter the ‘CREATE EXTERNAL SCHEMA’ command in your external data catalog. 

That said, the syntax for conducting Federated Query Redshift with MySQL databases slightly differs from the syntax for using federated queries with PostgreSQL databases. 

The syntax for making Federated Query Redshift with Amazon Aurora MySQL and Amazon RDS MySQL:

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] local_schema_name
FROM MYSQL
DATABASE 'federated_database_name' 
URI 'hostname' [ PORT port_number ]
IAM_ROLE { default | ‘arn:aws:iam::<AWS account-id>:role/<role-name>’
SECRET_ARN 'ssm-secret-arn' 

The syntax for making federated queries with Amazon RDS PostgreSQL and Amazon Aurora PostgreSQL:

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] local_schema_name
FROM POSTGRES
DATABASE 'federated_database_name' [SCHEMA 'schema_name']
URI 'hostname' [ PORT port_number ] 
IAM_ROLE { default | ‘arn:aws:iam::<AWS account-id>:role/<role-name>’
SECRET_ARN 'ssm-secret-arn'

Now you can start running your Federated Query Redshift.

Conclusion

In this article, you learnt about Federated Query Redshift and conducted Federated Queries across Amazon Redshift, MySQL, and PostgreSQL. Now, you can start running your Federated Query Redshift with ease. Contact the AWS Redshift console for more information, and contact our support team if you encounter any problems during this process. 

Visit our Website to Explore Hevo

Amazon Redshift stores data from multiple sources and every source follows a different schema. Instead of manually writing scripts for every source to perform the ETL (Extract Transform Load) process, one can automate the whole process. Hevo Data is a No-code Data pipeline solution that can help you transfer data from 100+ sources to Amazon Redshift or other Data Warehouse of your choice. Its fault-tolerant and user-friendly architecture fully automates the process of loading and transforming data to destination without writing a single line of code.  

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of learning about the Federated Query Redshift in the comments section below!

Isola Saheed Ganiyu
Freelance Technical Content Writer, Hevo Data

Isola is a freelance writer specializing in the domains of data integration and data analysis. He has a passion towards creating engaging and educational content, breaking down these sophisticated subjects for easily understanding.

No-code Data Pipeline For your Amazon Redshift

Get Started with Hevo