Are you tired of locally storing and managing files on your Postgres server? You can move your precious data to a powerful destination such as Amazon Redshift, and that too within minutes.
Data engineers are given the task of moving data between storage systems like applications, databases, data warehouses, and data lakes. This can be exhaustive and cumbersome. You can follow this simple step-by-step approach to transfer your data from PostgreSQL to Redshift so that you don’t have any problems with your data migration journey.
Why Replicate Data from Postgres to Redshift?
- Analytics: Postgres is a powerful and flexible database, but it’s probably not the best choice for analyzing large volumes of data quickly. Redshift is a columnar database that supports massive analytics workloads.
- Scalability: Redshift can quickly scale without any performance problems, whereas Postgres may not efficiently handle massive datasets.
- OLTP and OLAP: Redshift is designed for Online Analytical Processing (OLAP), making it ideal for complex queries and data analysis. Whereas, Postgres is an Online Transactional Processing (OLTP) database optimized for transactional data and real-time operations.
Methods to Connect or Move PostgreSQL to Redshift
Method 1: Connecting Postgres to Redshift Manually
Prerequisites:
Step 1: Configure PostgreSQL to export data as CSV
Step 1. a) Go to the directory where PostgreSQL is installed.
Step 1. b) Open Command Prompt from that file location.
Step 1. c) Now, we need to enter into PostgreSQL. To do so, use the command:
psql -U postgres
Step 1. d) To see the list of databases, you can use the command:
\l
I have already created a database named productsdb here. We will be exporting tables from this database.
This is the table I will be exporting.
Move your PostgreSQL Data to Redshift automatically
No credit card required
Step 1. e) To export as .csv, use the following command:
\copy products TO '<your_file_location><your_file_name>.csv' DELIMITER ',' CSV HEADER;
Note: This will create a new file at the mentioned location.
Go to your file location to see the saved CSV file.
Step 2: Load CSV to S3 Bucket
Step 2. a) Log Into your AWS Console and select S3.
Step 2. b) Now, we need to create a new bucket and upload our local CSV file to it.
You can click Create Bucket to create a new bucket.
Step 2. c) Fill in the bucket name and required details.
Load Data from PostgreSQL to Redshift
Load Data from PostgreSQL to BigQuery
Load Data from PostgreSQL to Snowflake
Note: Uncheck Block Public Access
Step 2. d) To upload your CSV file, go to the bucket you created.
Click on upload to upload the file to this bucket.
You can now see the file you uploaded inside your bucket.
Step 3: Move Data from S3 to Redshift
Step 3. a) Go to your AWS Console and select Amazon Redshift.
Step 3. b) For Redshift to load data from S3, it needs permission to read data from S3. To assign this permission to Redshift, we can create an IAM role for that and go to security and encryption.
Click on Manage IAM roles followed by Create IAM role.
Note: I will select all s3 buckets. You can select specific buckets and give access to them.
Click Create.
Step 3. c) Go back to your Namespace and click on Query Data.
Step 3. d) Click on Load Data to load data in your Namespace.
Click on Browse S3 and select the required Bucket.
Note: I don’t have a table created, so I will click Create a new table, and Redshift will automatically create a new table.
Note: Select the IAM role you just created and click on Create.
Step 3. e) Click on Load Data.
A Query will start that will load your data from S3 to Redshift.
Step 3. f) Run a Select Query to view your table.
Limitations of Using Custom ETL Scripts
These challenges have an impact on ensuring that you have consistent and accurate data available in your Redshift in near Real-Time.
- The Custom ETL Script method works well only if you have to move data only once or in batches from PostgreSQL to Redshift Migration.
- The Custom ETL Script method also fails when you have to move data in near real-time from PostgreSQL to Redshift.
- A more optimal way is to move incremental data between two syncs from Postgres to Redshift instead of full load. This method is called the Change Data Capture method.
- When you write custom SQL scripts to extract a subset of data often those scripts break as the source schema keeps changing or evolving.
Method 2: Using Hevo Data to connect PostgreSQL to Redshift
Prerequisites:
Step 1: Create a new Pipeline
Step 2: Configure the Source details
Step 2. a) Select the objects that you want to replicate.
Step 3: Configure the Destination details.
Step 3. a) Give your destination table a prefix name.
Note: Keep Schema mapping turned on. This feature by Hevo will automatically map your source table schema to your destination table.
Step 4: Your Pipeline is created, and your data will be replicated from PostgreSQL to Amazon Redshift.
Additional Resources for PostgreSQL Integrations and Migrations
Conclusion
This article detailed two methods for migrating data from PostgreSQL to Redshift, providing comprehensive steps for each approach.
The manual ETL process described in the second method comes with various challenges and limitations. However, for those needing real-time data replication and a fully automated solution, Hevo stands out as the optimal choice.
FAQ on PostgreSQL to Redshift
1. How can the data be transferred from Postgres to Redshift?
Following are the ways by which you can connect Postgres to Redshift
1. Manually, with the help of the command line and S3 bucket
2. Using automated Data Integration Platforms like Hevo.
2. Is Redshift compatible with PostgreSQL?
Well, the good news is that Redshift is compatible with PostgreSQL. The slightly bad news, however, is that these two have several significant differences. These differences will impact how you design and develop your data warehouse and applications. For example, some features in PostgreSQL 9.0 have no support from Amazon Redshift.
3. Is Redshift faster than PostgreSQL?
Yes, Redshift works faster for OLAP operations and retrieves data faster than PostgreSQL.
4. How to connect to Redshift with psql?
You can connect to Redshift with psql in the following steps
1. First, install psql on your machine.
2. Next, Use this command to connect to Redshift:
psql -h your-redshift-cluster-endpoint -p 5439 -U your-username -d your-database
3. It will prompt for the password. Enter your password, and you will be connected to Redshift.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. Check out our transparent pricing to make an informed decision!
Share your understanding of PostgreSQL to Redshift migration in the comments section below!
Aashish loves learning about data science and help businesses to solve problems through his content on data, software architecture, and integration.