The world is rapidly becoming digitized. Soon every aspect of our lives will be connected to the web, which will provide a higher level of convenience for users such as wide availability of information. While this is a good thing, the sheer number of data generated due to digitization is astonishing. Accordingly, companies need systems that can store this data and perform analysis to draw key conclusions.

Using customer information, global corporations can perform several tasks such as analysis to draw trends. Luckily, global tech giants such as Amazon have proven to be up to the task and developed systems to meet the data-driven demand worldwide. Today, several software, more specifically, Cloud-based Data Warehousing solutions are available to users worldwide. One of this software is AWS Redshift. 

In this post, you will be familiarized with AWS Redshift Primary Keys and Foreign Keys. More importantly, by the end, you should have a rough idea of what Primary and Foreign keys are and how to work with them on the AWS Redshift platform. Take a read below.

What is Amazon Redshift?

Amazon Redshift, AWS Redshift for short, is a popular Cloud Data Warehousing Solution capable of handling data on an exabytes scale. Over the years, it has become synonymous with successful companies all over the world due to the numerous benefits it offers. 

It is worth noting that AWS Redshift uses Massively Parallel Processing Technology (MPT) to query data. This enables it to perform various operations on large data volumes at lightning speed. To put this into perspective, AWS Redshift can work with data on the exabytes scale usually denoted by 1018. That’s pretty impressive!

Data hosted on Redshift is always encrypted, providing an extra layer of security for users. It can be deployed with just a few clicks and offers tons of features to enable users to easily import data. Below are some of the key features of Amazon Redshift.

Key Features of Amazon Redshift 

Below are some of the key features that have enabled Amazon Redshift to stand apart from the pack: 

  • Automation Capabilities: With AWS Redshift, you do not have to perform repetitive tasks such as generating daily, weekly, or monthly reports as the platform has automation capabilities. 
  • Intelligent Optimization: When querying large data sets, there are several ways you can query information using the same parameters. AWS Redshift helps in such situations by providing tools and data to improve queries. The software will also offer tips to enhance the database automatically. 
  • SQL Friendly: AWS Redshift is based on PostgreSQL, meaning all SQL queries can work on the platform. 
  • Data Encryption: This is an extra security feature as part of the Redshift operation. The user can decide which data needs encryption and which does not. 

Gain insights that can help you make the best out of your AWS Redshift from our article AWS Redshift Best Practices here.

Simplify Amazon Redshift ETL with Hevo’s No-code Data Pipeline

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Get Started with Hevo for Free

Working with AWS Redshift Primary Keys & Foreign Keys

Primary Key and Foreign Key: Redshift Primary Key
Primary Key and Foreign Key: Redshift Primary Key

Primary Keys & Foreign Keys are common names in the world of Relational Databases. In fact, anyone with a hint of knowledge in this field should have a rough idea of how this functionality work. Truth to the matter, they are similar, with the only difference being a few operational aspects. 

What are Redshift Primary Key Constraints?

A Redshift Primary Key sometimes referred to as the primary keyword, is a key in a Redshift Relational Database that is unique for each record. We are going to use some analogies for you to understand what a Redshift Primary Key truly is: 

Common identifiers that we interact with daily include Identification Cards, Driver License Numbers, Social Security Numbers, and Telephone Numbers. A Primary Key Constraint falls in this category, with the only difference being that it identifies unique records through constrained columns’ values that must uniquely identify each row in a Relational Database. Like every user has a unique Social Security Number, every Redshift Relational Database carries a unique Redshift Primary Key. 

Lastly, Redshift Primary Keys typically appear as columns in Redshift Relational Database Tables, but they can also be applied to at the table level. 

Redshift Primary Key Constraints Examples 

To illustrate how Redshift Primary Keys constraints work, here’s an example. We’ll first create a table called HevoPersons, with ID as the primary key at column level. 

CREATE TABLE HevoPersons (
    ID INT NOT NULL,
    LastName VARCHAR(255) NOT NULL,
    FirstName VARCHAR(255),
    Age INT,
    PRIMARY KEY (ID)
);

The ID is the Primary Key Constraint of the HevoPersons table, which uniquely identifies the identification number (ID) associated with each person in the table. The reason it’s called a constraint is because ID data type restricts the table to contain data (NOT NULL) that is unique from all other rows in the table. No two persons can have the same ID.

In case, if you would like to drop off the Redshift Primary Key constraint, you can use these two commands:

ALTER TABLE HevoPersons
DROP CONSTRAINT ID;

Here’s another example illustrating Redshift Primary Key Constraint at the table level. In this example, we are creating a table “Inventories” to list items and their Product_ID, Warehouse_ID, and Quantity on hand. 

CREATE TABLE Inventories (
    Product_ID INT,
    Warehouse_ID INT,
    Quantity_on_hand INT NOT NULL,
    PRIMARY KEY (Product_ID, Warehouse_ID)
  );

Let’s insert two duplicate records to see what happens. 

INSERT INTO Inventories VALUES (1, 1, 100);
INSERT INTO Inventories VALUES (1, 1, 200);

The result? System returns an error saying two records can’t have the same Product_ID and Warehouse_ID.

pq: duplicate key value (Product_ID,Warehouse_ID)=(1,1) violates unique constraint "primary"

What are Redshift Foreign Key Constraints?

A Redshift Foreign Key is a column (or combination of columns) in another table whose values must match values of a column in some other table. 

To explain it better, let’s extend our previous example. Suppose the unique identifiers we talked about exist in different databases. We have Social Security Numbers in one database, and Driver License Numbers in another. Now both of these parameters are unique to the table they belong to, but what if there is a need to link both to another unique identifier Employee ID in a new consolidated table? It’s in that final table we have what are called Foreign Keys-  Social Security Numbers and Driver License Numbers. A Redshift Foreign Key is a column (or combination of columns) in a table whose values must match values of a column in some other table.

An essential advantage of having Redshift Foreign Key Constraints is that Foreign Keys enforce referential integrity, which essentially says that if column value A refers to column value B, then column value B must exist. Which means, we can’t create an Employee ID with new Social Security Numbers and Driver License Numbers that aren’t present in other tables. 

Have a look at the example below to better your understanding.

Redshift Foreign Key Constraints Examples

In this example, we create a table called Hevo Customers with columns ID and Name.

CREATE TABLE HevoCustomers (ID INT PRIMARY KEY, Name STRING);

Next, we create a reference table Purchases.

CREATE TABLE IF NOT EXISTS Purchases(
    ID INT PRIMARY KEY,
    Customer INT NOT NULL REFERENCES Hevo Customers (ID),
    PurchaseTotal DECIMAL(9,2),
    INDEX (Customer)
  );

Let’s insert a record into each table.

INSERT INTO HevoCustomers VALUES (12023,Lorren), (12045, Sandra);
INSERT INTO Purchases VALUES (451, 12078,149.00);

When we execute the commands, the system returns an error because the Hevo Customer 12078 doesn’t exist in the referenced table.

pq: foreign key violation: value [12078] not found in HevoCustomers@primary [ID]

By now you should have a rough idea of how Redshift Primary Keys and Foreign Keys work. While these keys are used for reference purposes, Redshift does not enforce them, and this means that they serve informational purposes.  When used, they merely warn as if to say ‘Data in this column should be unique.’ The word ‘should’ is of particular importance here as opposed to ‘must.’ This is because if you go on ahead and duplicate data, Redshift is completely fine with that, and you will not receive any errors. Redshift assumes that loaded data is already unique. 

Conclusion 

By now, you should have an idea of what AWS Redshift Primary Keys and Foreign Keys are and what they are intended for. These tools are designed to fill informational purposes on Redshift. With this information in mind, you stand at a better chance of getting the most out of Redshift data.

While using AWS Redshift Services are insightful, it is a hectic task to set up and manage the proper environment on a regular basis. Extracting and integrating several heterogeneous sources into your Data Warehouse like Amazon Redshift is also a big task. To make things easier, Hevo comes to your rescue. Hevo Data is a No-code Data Pipeline and has awesome 150+ pre-built Integrations that you can choose from.

Visit our Website to Explore Hevo

Hevo can help you integrate your data from numerous sources and load them into destinations like Amazon Redshift to analyze real-time data with a BI tool such as Tableau. It will make your life easier and data migration hassle-free. It is user-friendly, reliable, and secure.

Sign Up for a 14-day free trial and see the difference!

Share your experience of learning about the Redshift Primary Keys and Foreign Keys in the comments section below. We would love to hear from you. 

Skand Agrawal
Customer Experience Engineer, Hevo Data

Skand is a dedicated Customer Experience Engineer at Hevo Data, specializing in MySQL, Postgres, and REST APIs. With three years of experience, he efficiently troubleshoots customer issues, contributes to the knowledge base and SOPs, and assists customers in achieving their use cases through Hevo's platform.

No-code Data Pipeline for Amazon Redshift