Replicating data from PostgreSQL on Amazon RDS to Redshift offers a multitude of benefits, unlocking the full potential of your data-driven initiatives.

  1. Amazon RDS provides a scalable and fully-managed relational database solution, ensuring effortless deployment and efficient data management.
  2. Meanwhile, Amazon Redshift, with its robust analytics capabilities, delivers a powerful data warehousing solution optimized for complex queries and massive datasets.
  3. By replicating data, you create a centralized repository that enables real time analytics and empowers data-driven decision-making.
  4. This seamless integration allows you to leverage Redshift’s performance, data processing capabilities, and scalability. As a result, you can drive deeper insights, identify trends, and gain a competitive edge in today’s data-driven landscape.
  5. Read on to explore the two straightforward methods explained in this article to achieve a seamless data pipeline that enhances your ability to make data-driven decisions.

Method 1: Load Data from PostgreSQL on Amazon RDS to Redshift

Step 1: Export Data from PostgreSQL on Amazon RDS to a CSV file:

  • Use the PostgreSQL administration tool or psql to connect to your Amazon RDS PostgreSQL database. You’ll require the endpoint of your RDS instance, username, and password. If you’re using psql, enter the following command:
psql -h rds_endpoint -U username -d database_name

Replace rds_endpoint, username, and database_name with the appropriate values.

  • Use the COPY command to export the data to a CSV file. Replace the table_name and file_path with the table name, desired file path, and name for the CSV file.
COPY table_name TO 'file_path' WITH CSV HEADER;

WITH CSV HEADER option will include the column headers in the CSV file.

  • You can also export one or more tables into a CSV file using the COPY command using the UNION clause. But remember, the data from the different tables should have the same column structure to be combined successfully in the CSV file.
  • After executing the above command, specified table data will be exported into a CSV file. Verify and clean the data before uploading it into the S3 bucket.

Step 2: Upload CSV Files to Amazon S3

  • Using AWS Console: Navigate to the S3 service.
  • Create a bucket in Amazon S3. If you’re already using the existing one, move on to the next step to upload a CSV file.
  • Click on Create Bucket.
  • Follow the prompts to name your bucket, select a region, configure any additional settings as needed, and choose Create.
  • Select the name of the new bucket. Choose the Actions button, and click Create Folder in the drop-down list. Next, name the new folder.
  • Upload and Verify CSV files
  • With your S3 bucket selected or created, click on the bucket’s name to open its details page and select the folder to move the CSV files.
  • Click on the Upload button to start the file upload process.
  • In the file upload dialog, click on Add Files or drag and drop the CSV files you want to move to the S3 bucket.
  • Confirm the upload by clicking the Start Upload button.
  • Once the upload is complete, you can verify that the CSV files are now present in your S3 bucket. Click on the bucket name and navigate through the folder to find your uploaded CSV files.
  • Using AWS CLI: Open the AWS CLI.
  • Use the following command to upload CSV files to S3:
aws s3 cp path_csvfile.csv s3://bucket-name/

Replace path_csvfile with the local path to your CSV file and bucket-name with the specific folder and bucker name of your S3 bucket.

Step 3: Amazon S3 to Amazon Redshift

  • Create a new IAM role or use an existing one that has permission to access the S3 bucket where the CSV file is stored.
  • Attach the AmazonS3ReadOnlyAccess policy to the IAM role.
  • Run the COPY command to import the data from the CSV file into your Redshift table.
COPY redshift_table_name 
FROM 's3://bucket-name/folder-name/csv-file.csv' 
CREDENTIALS 'aws_iam_role=redshift_iam_role' 
DELIMITER ',' CSV;

Replace redshit_table_name with the actual name of your Redshift table, and redshift_iam_role with the ARN of the IAM role. The COPY command will load the data from the CSV file stored in the S3 bucket into redshift_table_name in Redshift.

  • After executing the COPY command, verify that the data has been transferred correctly.

That’s it! You have manually connected PostgreSQL on Amazon RDS to Amazon Redshift using CSV files.

Although utilizing a manual approach with CSV files may require significant time and effort, it proves advantageous for limited datasets or one-time transfers from Amazon RDS PostgreSQL to Redshift.

It is efficient and manageable, making it an ideal choice for one-time backups without the need to invest in specialized software.

Method 2: Using a No-Code Tool to Build PostgreSQL on Amazon RDS to Redshift ETL Pipeline

  • Hevo Data is a cloud-based data replication service that accelerates the process of extraction, transformation, and loading of data from various sources and destinations.

Step 1: Connect and Configure Amazon RDS PostgreSQL as Data Source

Connect Amazon RDS PostgreSQL to the Hevo platform by simply entering database details, such as database host, port, username, and password.

PostgreSQL on Amazon RDS to Redshift: Configure Source

Step 2: Connect and Configure Amazon Redshift

To replicate data from AWS RDS PostgreSQL to Redshift, simply provide your Redshift database credentials, including the Database Port, Username, Password, Name, Schema, and Cluster Identifier, along with the Destination Name.

PostgreSQL on Amazon RDS to Redshift: Configure Destination

Check out some of the unique features of the Hevo Data platform:

  1. Fully Managed
  2. Auto Schema Mapping
  3. Data Transformation
  4. Monitoring and Alerts
  5. Data Security
  6. Live Support

What Can You Achieve with PostgreSQL on Amazon RDS to Redshift ETL?

  1. Enhanced Customer Insights
  2. Gain Deeper Team Insights
  3. Acquire In-Depth Customer Insight

Conclusion

  • So far, you’ve seen two different approaches for integrating PostgreSQL on Amazon RDS to Redshift.
  • The first method was using the CSV files and Amazon S3, and the second included the Hevo Data Platform. Each approach has its benefits and is suited for specific scenarios.
  • Using CSV files and Amazon S3 provides a manual approach for one-time data transfer. While it can be suitable for small datasets or occasional backups, it might not be efficient for real time integration due to its time-consuming nature.

If you are interested, you can try Hevo by signing up for the 14-day free trial.

Tejaswini Kasture
Technical Content Writer, Hevo Data

Tejaswini is a passionate data science enthusiast and skilled writer dedicated to producing high-quality content on software architecture and data integration. Tejaswini's work reflects her deep understanding of complex data concepts, making them accessible to a wide audience. Her enthusiasm for data science drives her to explore innovative solutions and share valuable insights, helping professionals navigate the ever-evolving landscape of technology and data.

All your customer data in one place.