Two powerhouse services from AWS are Amazon S3 and Aurora – one provides robust object storage, and the other a high-performance relational database compatible with both MySQL and PostgreSQL. Turning your data from S3 into Aurora allows you to unlock the power of using both semi-structured and structured data sources, making it a source of truth for analytics for you. This enables you to drive insightful, scheduled, or real-time analytics, maximizing the potential of your data.
This guide will outline two straightforward methods for moving Amazon S3 data to an Aurora table. By the end of this article, you will gain a clear understanding and learn the necessary steps to connect Amazon S3 to Amazon Aurora.
Seamlessly integrate your data from Amazon S3 to Amazon Aurora, and, well, a competitive advantage with faster insights and more intelligent decision-making.
Why Hevo?
- Integrate your data easily with Hevo’s no-code solution.
- Provides an auto-mapping feature.
- Connect to over 150+ sources, including databases and SaaS applications.
Explore why POSTMAN chose Hevo over other data pipelines for easy data ingestion pipeline creation and robust Role-Based Access Control (RBAC). Try out the 14-day full-access free trial today to experience seamless data integration.
Get Started with Hevo for Free
Methods to Connect Amazon S3 to Amazon Aurora
- Method 1: Custom Scripts Approach to Transfer Data from Amazon S3 to Amazon Aurora.
- Method 2: Using No-Code Tool to Connect Amazon S3 to Amazon Aurora
Method 1: Custom Scripts Approach to Transfer Data from Amazon S3 to Amazon Aurora
This method illustrates how you can load data from Amazon S3 to Amazon Aurora PostgreSQL-compatible database. It involves the following steps:
- Installing the aws_s3 extension
- Setting up access to the S3 bucket
- Connect to Amazon Aurora
- Use a psql query to copy the data to your Aurora PostgreSQL DB cluster
Here are the details of each step for an Amazon S3 Amazon Aurora integration:
Step 1: Installing the aws_s3 extension
The aws_s3 extension allows you to read data from S3 into Aurora PostgreSQL tables and write data from Aurora PostgreSQL tables to S3. The aws_s3 extension provides functions like s3_import and s3_export to import and export data between the databases and the Amazon S3 bucket.
To install the extensions, run the following command in your Aurora PostgreSQL CLI.
postgres=> CREATE EXTENSION aws_s3 CASCADE;
To confirm if the extension is successfully installed, you can use the psql \dx metacommand. Alternatively, you can run the query below.
select * from pg_available_extensions where installed_version is not null;
Once you download this extension, Aurora PostgreSQL will be able to interact with AmazonS3 and will allow you to import and export data between the database and S3.
Step 2: Setting up Access to an Amazon S3 Bucket
Now, collect all the details you need to supply the function. This includes Aurora PostgreSQL DB table name, bucket name, file path, file type, and AWS Region where the S3 data is located. If you’ve not created a table in the Aurora PostgreSQL DB cluster to copy S3 data, run the following command. The command below creates a table named info, which is the aws_s3.table_import_from_s3 function will be used to import the data.
postgres=> CREATE TABLE info (column1 varchar(60), column2 varchar(60), column3 varchar(60));
To get the Amazon S3 bucket details, open the Amazon S3 console and select Buckets. Find and choose the bucket containing your data. Select its Object Overview page and choose Properties. Here you will get the bucket’s name, path, file type, AWS Region, and Amazon Resource Name (ARN).
Now, set up permissions on your Aurora PostgreSQL-compatible DB cluster to allow access to the Amazon S3 bucket containing the file. You can either create an AWS Identity and Access Management (IAM) role or security credentials. Here, we will proceed by using an IAM role to access an Amazon S3 bucket for Windows. This includes two steps: first, create an IAM policy and then create an IAM role.
Step 2.1: Create an IAM policy.
To create an IAM policy, run the following command in your AWS CLI:
aws iam create-policy ^
--policy-name aurora-s3-import-policy ^
--policy-document '{
"Version": "YYYY-MM-DD",
"Statement": [
{
"Sid": "s3import",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::nameof-s3-bucket",
"arn:aws:s3:::nameof-s3-bucket/*"
]
}
]
}'
The above command creates an IAM policy named aurora-s3-import-policy and grants access to a bucket named nameof-s3-bucket. This policy will provide the bucket and object permission that will allow your Aurora PostgreSQL DB cluster to access Amazon S3.
Step 2.2: Create an IAM role
aws iam create-role ^
--role-name aurora-s3-import-role ^
--assume-role-policy-document '{
"Version": "YYYY-MM-DD",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "rds.amazonaws.com"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"aws:SourceAccount": "**********",
"aws:SourceArn": "********"
}
}
}
]
}'
The above command will create a role named aurora-s3-import-role.
After creating the role and policy, attach the IAM policy to the IAM role that you’ve just created using the following command:
aws iam attach-role-policy ^
--policy-arn arn^
--role-name aurora-s3-import-role
Replace ARN with your policy ARN.
Step 3: Connect to Amazon Aurora
Now you can add an IAM role for an Aurora PostgreSQL DB cluster using: AWS CLI or Console. Here’s a step-by-step process included in using both ways:
Step 3.1: Using AWS CLI
aws rds add-role-to-db-cluster ^
--db-cluster-identifier my-db-cluster ^
--feature-name s3Import ^
--role-arn role-arn ^
--region aws-region
Replace my-db-cluster with your Aurora PostgreSQL DB cluster name, role-arn with your role ARN details, and mention your AWS region.
Step 3.2: Using the Console
Log in to your AWS Management Console and open the Amazon RDS console.
Select the PostgreSQL DB cluster name. On the Connectivity & security tab, mention the role to add under Add IAM roles to this cluster in the Manage IAM roles section.
Select s3 import under the Feature and click on Add role.
Step 4: Use a PSQL query to Copy the Data to your Aurora PostgreSQL DB cluster
Finally, run the query below to import data from your S3 bucket using the table_import_from_s3 function of the aws_s3 extension.
postgres=> SELECT aws_s3.table_import_from_s3(
'info',
' ',
'(format csv)',
:'s3_uri'
);
Let’s break down the above syntax:
- Info: Mention the name of the table (refer to step 2) in the Aurora PostgreSQL DB cluster where the data will be copied.
- ‘ ‘: This parameter indicates which S3 columns must be replicated in Aurora PostgreSQL table columns. If you don’t specify any columns, all the column data will be copied to the table.
- (format csv): This specifies the data format of the files in S3. In this case, it’s set to CSV format.
- S3_uri: Mention the unique identifier that specifies the file’s location within your S3 bucket.
Likewise, using the aws_s3 extension, you can also import an Amazon S3 file data of custom delimiter format, a compressed (gzip) file, or an encoded file.
Note: The above steps are applicable only for Aurora PostgreSQL-compatible databases. For the Aurora MySQL-compatible database, you can follow this guide.
While using the custom-scripts method looks time-consuming, it is better suited for several use cases:
- Flexibility: Custom scripts provide you with complete flexibility and control over the entire data movement. You can enrich your existing data by combining data from other sources.
- Data Security: In a highly regulated industry with restrictions on data access, applying custom scripts for data migration is ideal because it does not involve data transfer to third-party servers.
- Data Accuracy: Data validation ensures that the data will be correct and accurate after migration. Adding validation checks to your custom scripts will ensure that you see whether the transferred data meets the expected standards.
Limitations of the Custom Scripts Method to Move Data from Amazon S3 to Amazon Aurora
- Developing custom scripts requires technical skills to design, implement, test, and maintain the scripts. As the data volume increases, it could be time-consuming and resource-intensive.
- The custom scripts not only need development and maintenance effort but also require robust error-handling mechanisms through writing numerous test cases. It can also include implementing backups through data replication for quick recovery during downtime.
Transfer Amazon S3 to MySQL Amazon Aurora
Transfer Amazon S3 to Redshift
Transfer MongoDB to MySQL Amazon Aurora
Method 2: Using No-Code Tool to Connect Amazon S3 to Amazon Aurora
Step 1: Connect and Configure Data Source:
- Check the prerequisites before setting Amazon S3 as your source.
- Login into your Hevo account and Click on + Create Pipeline.
- Choose S3 and select how do you want to connect the IAM Role or Access Credentials.
- Mention the required details and click on Test and Continue.
Step 2: Connect and Configure Destination:
- Check the prerequisites before setting Amazon Aurora as your destination.
- Select pipeline and destination.
- Fill in the Aurora account details such as database host, port username, password, and database name. After filling in all the details, click on Save & Continue.
That’s it! Now the Hevo Data pipeline will start capturing data from Amazon S3 and replicate it into your Amazon Aurora tables.
Check out some of the reasons to choose the Hevo Data platform for your data integration needs:
- User-friendly Interface: Hevo provides a user-friendly, no-code interface to set up data integrations without any coding skills.
- Real Time Data Integration: Hevo supports real-time data integration, allowing you to capture and process data as soon as it is generated.
- Automatic Schema Detection and Mapping: Hevo detects schema as it extracts data from the source and automatically maps the transformed data to the target schema.
- Monitoring and Alerting: Hevo includes several features to monitor data which include error handling and data lineage tracking.
What can you Achieve by Replicating Data from Amazon S3 to Amazon Aurora?
Amazon S3 and Amazon Aurora integration can provide several benefits and enable various use cases. Here are some of them:
- By centralizing your data into a relational database, you can perform real-time sales analysis to gain immediate insights into your customers’ buying behavior, sales trend, and product performance.
- Moving data to Aurora allows you to integrate data from multiple sources, including sales and marketing.
- By migrating data to Aurora, you can create a structured database of all your customer-product relationships. This includes customer profiles, campaign data, website analytics, and advertising data.
Conclusion
The custom script approach of Amazon S3 to Amazon Aurora ETL discussed in this article is practical and applicable only in a one-off kind of scenario. As the process involves writing code and fetching several details, it demands manual time and effort.
Hevo has pre-built integrations with 150+ sources. You can connect your SaaS platforms, databases, etc., to any data warehouse you choose without writing any code or worrying about maintenance. If you are interested, you can try Hevo by signing up for the 14-day free trial.
Frequently Asked Questions
1. Can we query data from S3?
Yes, you can query data directly from Amazon S3 using services such as Amazon Athena, which actually allows running SQL queries directly on S3 data without needing to load into a database.
2. Which AWS service can be used to load data from Amazon S3 transform it and move it to another destination?
AWS Glue is a service you can use for data loading from Amazon S3, transformation, and moving data into another destination. It’s a fully managed ETL that you can rely on to prepare and transform data for analytics.
3. What is the most expensive S3?
In Amazon S3, the most expensive storage class is the Standard class, optimized for high performance in access-optimized data. However, data retrieval and considerable access with S3 Intelligent Tiering can also be very costly.
Tejaswini is a passionate data science enthusiast and skilled writer dedicated to producing high-quality content on software architecture and data integration. Tejaswini's work reflects her deep understanding of complex data concepts, making them accessible to a wide audience. Her enthusiasm for data science drives her to explore innovative solutions and share valuable insights, helping professionals navigate the ever-evolving landscape of technology and data.