Are you tired of locally storing and managing files on your Postgres server? You can move your precious data to a powerful destination such as Amazon Redshift, and that too within minutes.

Data engineers are given the task of moving data between storage systems like applications, databases, data warehouses, and data lakes. This can be exhaustive and cumbersome. You can follow  this simple step-by-step approach to transfer your data from PostgreSQL to Redshift so that you don’t have any problems with your data migration journey.

Why Replicate Data from Postgres to Redshift?

  • Analytics: Postgres is a powerful and flexible database, but it’s probably not the best choice for analyzing large volumes of data quickly. Redshift is a columnar database that supports massive analytics workloads.
  • Scalability: Redshift can quickly scale without any performance problems, whereas Postgres may not efficiently handle massive datasets.
  • OLTP and OLAP: Redshift is designed for Online Analytical Processing (OLAP), making it ideal for complex queries and data analysis. Whereas, Postgres is an Online Transactional Processing (OLTP) database optimized for transactional data and real-time operations.
Load Data from PostgreSQL to Redshift
Load Data from MongoDB to Redshift
Load Data from Salesforce to Redshift

Methods to Connect or Move PostgreSQL to Redshift

Method 1: Connecting Postgres to Redshift Manually

Prerequisites:

Step 1: Configure PostgreSQL to export data as CSV

Step 1. a) Go to the directory where PostgreSQL is installed.

Postgres Directory

Step 1. b) Open Command Prompt from that file location.

Command Prompt

Step 1. c) Now, we need to enter into PostgreSQL. To do so, use the command:

psql -U postgres
Enter into Postgres

Step 1. d) To see the list of databases, you can use the command:

\l
List of Databases

I have already created a database named productsdb here. We will be exporting tables from this database.

This is the table I will be exporting.

Products Tbale

Step 1. e) To export as .csv, use the following command:

\copy products TO '<your_file_location><your_file_name>.csv' DELIMITER ',' CSV HEADER;

Note: This will create a new file at the mentioned location.

Go to your file location to see the saved CSV file.

products csv file

Step 2: Load CSV to S3 Bucket

Step 2. a) Log Into your AWS Console and select S3.

AWS Console

Step 2. b) Now, we need to create a new bucket and upload our local CSV file to it.

You can click Create Bucket to create a new bucket.

Bucket Creation AWS

Step 2. c) Fill in the bucket name and required details.

Note: Uncheck Block Public Access

Block Public Access setting

Step 2. d) To upload your CSV file, go to the bucket you created.

General Purpose Buckets

Click on upload to upload the file to this bucket.

Upload file to bucket

You can now see the file you uploaded inside your bucket.

Uploaded File

Step 3: Move Data from S3 to Redshift

Step 3. a) Go to your AWS Console and select Amazon Redshift.

AWS Console

Step 3. b) For Redshift to load data from S3, it needs permission to read data from S3. To assign this permission to Redshift, we can create an IAM role for that and go to security and encryption.

Security and Encryption Tab

Click on Manage IAM roles followed by Create IAM role.

Manage IAM Roles

Note: I will select all s3 buckets. You can select specific buckets and give access to them. 

Click Create.

Create IAM role

Step 3. c) Go back to your Namespace and click on Query Data.

Namespace Configuration Window

Step 3. d) Click on Load Data to load data in your Namespace.

Query data window

Click on Browse S3 and select the required Bucket.

Load data configurations page
Browse for bucket window

Note: I don’t have a table created, so I will click Create a new table, and Redshift will automatically create a new table.

Load Data Configuration

Note: Select the IAM role you just created and click on Create.

Step 3. e) Click on Load Data.

Load Data

A Query will start that will load your data from S3 to Redshift.

Loading data process

Step 3. f) Run a Select Query to view your table.

Method 2: Using Hevo Data to connect PostgreSQL to Redshift

Prerequisites:

Step 1: Create a new Pipeline

Pipelines Overview

Step 2: Configure the Source details

Select Source Details

Step 2. a) Select the objects that you want to replicate.

Select Tables

Step 3: Configure the Destination details.

Select Destination

Step 3. a) Give your destination table a prefix name.

Note: Keep Schema mapping turned on. This feature by Hevo will automatically map your source table schema to your destination table.

Final Settings Hevo

Step 4: Your Pipeline is created, and your data will be replicated from PostgreSQL to Amazon Redshift.

Pipeline Overview

Limitations of Using Custom ETL Scripts

These challenges have an impact on ensuring that you have consistent and accurate data available in your Redshift in near Real-Time.

  • The Custom ETL Script method works well only if you have to move data only once or in batches from PostgreSQL to Redshift.
  • The Custom ETL Script method also fails when you have to move data in near real-time from PostgreSQL to Redshift.
  • A more optimal way is to move incremental data between two syncs from Postgres to Redshift instead of full load. This method is called the Change Data Capture method.
  • When you write custom SQL scripts to extract a subset of data often those scripts break as the source schema keeps changing or evolving.

Additional Resources for PostgreSQL Integrations and Migrations

Conclusion

This article detailed two methods for migrating data from PostgreSQL to Redshift, providing comprehensive steps for each approach.

The manual ETL process described in the second method comes with various challenges and limitations. However, for those needing real-time data replication and a fully automated solution, Hevo stands out as the optimal choice.

FAQ on PostgreSQL to Redshift

How can the data be transferred from Postgres to Redshift?

Following are the ways by which you can connect Postgres to Redshift
1. Manually, with the help of the command line and S3 bucket
2. Using automated Data Integration Platforms like Hevo.

Is Redshift compatible with PostgreSQL?

Well, the good news is that Redshift is compatible with PostgreSQL. The slightly bad news, however, is that these two have several significant differences. These differences will impact how you design and develop your data warehouse and applications. For example, some features in PostgreSQL 9.0 have no support from Amazon Redshift.

Is Redshift faster than PostgreSQL?

Yes, Redshift works faster for OLAP operations and retrieves data faster than PostgreSQL.

How to connect to Redshift with psql?

You can connect to Redshift with psql in the following steps
1. First, install psql on your machine.
2. Next, Use this command to connect to Redshift:
psql -h your-redshift-cluster-endpoint -p 5439 -U your-username -d your-database
3. It will prompt for the password. Enter your password, and you will be connected to Redshift.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. Check out our transparent pricing to make an informed decision!

Share your understanding of PostgreSQL to Redshift migration in the comments section below!

Aashish
Freelance Technical Content Writer, Hevo Data

Aashish loves learning about data science and help businesses to solve problems through his content on data, software architecture, and integration.