Working with Redshift Data APIs Simplified 101

on Amazon Redshift, API, Data Warehouses, redshift architecture, Redshift Data APIs, Tutorials • September 13th, 2021 • Write for Hevo

redshift data api: featured image

The Amazon Redshift Data API enables you to painlessly access data from Amazon Redshift with all types of traditional, cloud-native, and containerized, serverless web service-based applications and event-driven applications. The following diagram illustrates this architecture.

Table of Contents

Introduction to Redshift

Redshift Data API: Redshift Logo
Image Source: www.blazeclan.com

AWS Redshift is a cloud-based serverless data warehouse provided by Amazon as a part of Amazon Web Services. It is a fully managed and cost-effective data warehouse solution. AWS Redshift is designed to store petabytes of data and can perform real-time analysis to generate insights.

AWS Redshift is a column-oriented database, and stores the data in a columnar format as compared to traditional databases that store in a row format. Amazon Redshift has its own compute engine to perform computing and generate critical insights. 

Learn more about AWS Redshift.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Salesforce, Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ data sources (including 30+ free data sources like Salesforce ) and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

GET STARTED WITH HEVO FOR FREE[/hevoButton]

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
SIGN UP HERE FOR A 14-DAY FREE TRIAL

AWS Redshift Architecture

AWS Redshift has straightforward Architecture. It contains a leader node and a cluster of compute nodes that perform analytics on data. The below snap depicts the schematics of AWS Redshift architecture: 

Redshift Data Api: Redshift Architecture
Image Source: docs.aws.amazon.com

AWS Redshift offers JDBC connectors to interact with client applications using major programming languages like Python, Scala, Java, Ruby, etc.

Key Features of AWS Redshift

  1. Redshift allows users to write queries and export the data back to Data Lake.
  2. Redshift can seamlessly query the files like CSV, Avro, Parquet, JSON, ORC directly with the help of ANSI SQL.
  3. Redshift has exceptional support for Machine Learning and developers can create, train and deploy Amazon Sagemaker models using SQL.
  4. Redshift’s Materialistic view allows you to achieve faster query performance for ETL, batch job processing, and dashboarding.
  5. Redshift has a petabyte scalable architecture and it scales quickly as per need.

Introduction to an API

Application Programming Interface (API) is a software bridge that sets communication between two different software to perform designated tasks. One such example of API is sending WhatsApp messages, Facebook, searching songs on Google, etc. Everything we do over the computer is an API. In most of the case, API is the bridge between User Interface and Backend. 

The below schematic shows the working methodology of an API – 

Redshift Data Api: API process
Image Source – help.eset.com

Types of API

  1. API by Security Policies can be Private, and Public.
  1. Private API – Private APIs are those API that is private to an organization and can be used within that organization. The developers and contractors of that organization can effectively use the API to develop and create new solutions, leverage existing systems, and many more. Private APIs can also be available to the Partnered vendors and can be called Partner API.
  1. Public API – Public APIs are external-facing APIs that are available to third-party developers. Facebook Developers API is such an example of a Public API that allows the third-party developers to use and interact with the Facebook applications to get the data and use them in their products. Public APIs can be an open free tier or commercial (pay-as-you-go or subscription) type.
  1. API by Use cases are as follow – 
  1. Database API – Database APIs enable the communication between a Database with another application. One such example is Redshift API which interacts with the Redshift APIs from the command line to create tables, select queries, list databases, and many more.
  1. Operating System API – Operating System APIs are related to Operating Systems like Windows API, Linux API, MacOS, etc. for building services and applications on different Operating Systems, 
  1. Remote API – Remote APIs are the APIs that connect two different applications running remotely or on different machines over the network. Remote machines are often connected via the Internet and hence the remote APIs are written in web standards protocols (SOAP or REST). 
  1. Web API – Web APIs are the most common API that is used nowadays. The Web API represents client-server architecture and these APIs mainly deliver requests from web applications and responses from servers using Hypertext Transfer Protocol (HTTP).

Developers can use web APIs to extend the functionality of their apps or sites. For example, Facebook Developers API can be used to extract information, Google maps API can be used for organizational purposes. 

Key Features of APIs

  1. APIs are a simple and effective way of establishing communication between two applications.
  2. APIs are language agnostic which means users can use any language to interact with the application that provides API access.
  3. APIs are a secure way of establishing connections over HTTPS.
  4. REST APIs are stateless and easy to program with its constraint methods viz. GET, POST, DELETE, PUT, etc.

Steps to Set Up Amazon Redshift Data APIs

Amazon Redshift Data API is API enabled and by using its API user can perform various operations via CLI (command-line interface)

Step 1: Authorizing Access to an Amazon Redshift Data API

To use the Redshift Data API, the user should be authorized. This can be done by registering the user and assigning the relevant policy from the AWS IAM page. You can access the IAM page from here.

The Policy required to gain full access to Redshift API is termed as AmazonRedshiftDataFullAccess on the AWS IAM page. This policy also provides access to AWS Secrets Manager and API access to Redshift Cluster. 

Follow the official guide to get more information on IAM roles and attaching policies.

Step 2: Database Storage in AWS Secrets Manager

While making a call to the Redshift Data API, the credentials to access the database can be stored in AWS Secrets Manager and can be then you can pass the credentials by using the same.

Steps to store credentials on Secrets Manager

  1. Login to AWS console and choose service AWS Secrets Manager
  2. You need to have a SecretManagerReadWrite role to be able to create secrets.
  3. Click on Store a New Secret and choose Credentials for Redshift Cluster.
  4. Store your values for a User name (database user), Password, and DB cluster (cluster identifier) in your secret.
  5. Tag the secret with the key RedshiftDataFullAccess. The AWS-managed policy AmazonRedshiftDataFullAccess only allows the action secretsmanager:GetSecretValue for secrets tagged with the key RedshiftDataFullAccess.

Step 3: Configuring Authorization Credentials & Calling the API

Once the above steps are successful, you can now use the Redshift Data API from AWS CLI to interact with the Redshift cluster. 

To set up the AWS CLI, you can follow the official documentation here.

After setting up the AWS CLI, the Redshift Data API to interact with data can be called as – 

aws redshift-data help

The command will list out all the API commands available with Redshift Data API CLI.

CommandDescription
list-databasesTo List the databases in a Redshift cluster.
list-schemasTo Lists the schemas in a database. You can filter this by a matching schema pattern.
list-tablesLists the tables in a database.
describe-tableDescribes the table with column metadata
execute-statementRuns a SQL statement, which can be SELECT, DML, DDL, COPY, or UNLOAD.
batch-execute-statementTo execute multiple SQL statements in batches i.e. one after another.
cancel-statement To cancel the running query.
describe-statementDescribes the details of a specific SQL statement run. The information includes when the query started when it finished, the number of rows processed, and the SQL statement.
list-statementsLists the SQL statements. By default, only finished statements are shown.
get-statement-resultFetches the temporarily cached result of the query. The result set contains the complete result set and the column metadata. You can paginate through a set of records to retrieve the entire result as needed.

Now that we know the basics of the Redshift Data API, let’s see some examples of the commands available in the Redshift Data API. Before that Navigate to the Secrets Manager to get the Secrets Key to authorize Redshift Data API call.

1) List databases

The List database command provides the lists of databases available in the Redshift cluster. You can choose one of the Redshift database to perform the later operations like Create table, running a SQL, etc.

Syntax – 

aws redshift-data list-databases  --cluster-identifier <your-cluster-id> --secret-arn  <provide-your-secret-ARN> --region <your-aws-redshift-region>

To execute the query, replace the <> with actual values.

2) List Schema

The List schema command provides the schemas present in a particular database. 

Syntax – 

aws redshift-data list-schemas  --cluster-identifier <your-cluster-id> --secret-arn <provide-your-secret-ARN> --database <your-db-name> --region <your-aws-redshift-region>

You can also specify the schema patterns to filter out the schema with specific patterns.

Syntax – 

aws redshift-data list-schemas --database <your-db-name> --cluster-identifier <your-cluster-id> --secret-arn <provide-your-secret-ARN ARN> --region <your-region> --schema-pattern "test%"

To execute the query, replace the <> with actual values.

3) List Tables

The Redshift Data API provides a command to list the tables in a specified database. It also allows you to filter the table based on some pattern to limit the query output.

Syntax – 

aws redshift-data list-tables --database dev --cluster-identifier <your-cluster-id>--secret-arn <your-secret-arn> --database <your-db-name> --region <your-region> --table-pattern "ven%"

You can filter your tables list in a specific schema pattern:

aws redshift-data list-tables --database <your-db-name> --cluster-identifier <your-cluster-id> --secret-arn <your-secret-arn> --region <your-region> --table-pattern "ven%" --schema-pattern demo

4) Execute Statements

Execute statement allows you to run the SQL queries against Redshifts Database and Tables. 

Syntax – 

aws redshift-data execute-statement 
     --database <your-db-name>  
     --cluster-identifier <your-cluster-id> 
     --secret-arn <your-secret-arn> 
     --sql "CREATE SCHEMA test;" 
     --region <your-region>

The above syntax shows how you can execute SQL statements and it allows you to execute any valid SQL against Redshift Tables and Databases.

Let’s have a look at different SQL statements that can be executed via execute-statement

5) Create Table 

aws redshift-data execute-statement 
            --database <your-db-name> 
           --cluster-identifier <your-cluster-id> 
            --secret-arn <your-secret-arn>  
            --sql "CREATE TABLE test.emp(
      emp_id VARCHAR(4), 
      emp_name VARCHAR(4), 
      emp_dept  VARCHAR(4), 
  );" 
            --region <your-region>

6) Load data to Table

aws redshift-data execute-statement 
            --database <your-db-name> 
            --cluster-identifier <your-cluster-id> 
            --secret-arn <your-secret-arn>  
            --region <your-region>  
            --sql "COPY test.emp 
FROM 's3://path/to/bucket/emp.csv' 
IAM_ROLE 'arn:aws:iam::<Your_ACCOUNT>:role/<YourRole>' 
DATEFORMAT 'auto' 
IGNOREHEADER 1 
DELIMITER ',' 
IGNOREBLANKLINES 
REGION 'us-west-2';"

7) Run the select statement

aws redshift-data execute-statement 
    --database <your-db-name> 
    --cluster-identifier <your-cluster-id> 
    --secret-arn <your-secret-arn>  
    --region <your-region> 
    --sql "SELECT * FROM TEST.EMP;"

The above query will return the JSON which contains the query ID and other parameters.

{
    "ClusterIdentifier": "redshift-cluster-1",
    "CreatedAt": "2021-09-08T09:39:45.693000-07:00",
    "Database": "dev",
    "Id": "fc44t11f0-0341-4t46a-83cf-d5922afds100a",
    "SecretArn": ""
}

8) Retrieve Results

To Retrieve results from the query, you have to run another statement to get the result – 

aws redshift-data get-statement-result 
--id fc44t11f0-0341-4t46a-83cf-d5922afds100a 
--region <your-region>

The above command will provide a JSON that will contain column metadata and its value.

Conclusion

That’s it, isn’t it easy to set up and use Redshift Data API. In this blog post, we have discussed in detail Redshift Data API and how you can use it.

While using Redshift Data API is insightful, it is a hectic task to Set Up the proper environment. To make things easier, Hevo comes into the picture. Hevo Data is a No-code Data Pipeline and has awesome 100+ pre-built Integrations that you can choose from.

visit our website to explore hevo

Hevo can help you Integrate your data from numerous sources and load them into destinations like Redshift to Analyze real-time data with a BI tool such as Tableau. It will make your life easier and data migration hassle-free. It is user-friendly, reliable, and secure.

SIGN UP for a 14-day free trial and see the difference!

Share your experience of learning about the Redshift Data API in the comments section below

No-code Data Pipeline For Redshift