This article aims at providing you with a step-by-step guide to help you set up the SFTP S3 Integration with ease to help you transfer your data to Amazon S3 for a fruitful analysis securely. Upon a complete walkthrough of the content, you’ll be able to set up SFTP S3 Integration easily. It will further help you build a customized ETL pipeline for your organization. Through this article, you will get a deep understanding of the tools and techniques & thus, it will help you hone your skills further.
However, before getting started with SFTP S3 integration, let’s discuss SFTP and S3 in brief.
Prerequisites
To set up the SFTP S3 Integration, you must have a:
- Working knowledge of Amazon S3.
- Working knowledge of SFTP/FTP.
- A general idea about the Amazon Web Services environment.
- An Amazon S3 account and bucket.
What is SFTP (SSH File Transfer Protocol)
SFTP is a robust and secure protocol that allows users to establish file transfer-based connections and share files with ease. It leverages the Secure Shell (SSH) stream to set up this connection and facilitates file transfer, allowing users to share files across numerous systems and applications seamlessly. It requires only a single port number to set up a server-based connection. With SSH 2.0 in place, it further provides users with enhanced security and data transfer functionalities over the SSH network protocol.
For further information on SFTP, you can check the official website.
What is Amazon S3
Amazon’s Simple Storage Service is widely popular for a range of collaborative and easy-to-use storage services over the Internet. Developed to facilitate easy computing, data storage, and retrieval at a web-scale, it allows access to any volume of data, from anywhere, and at any time you desire. With Amazon S3 in place, customers can store, upload, or download files up to 5TB in size, with a maximum of 5GB being permissible for a single upload.
Key Features of Amazon S3
The following features make Amazon S3 so popular in today’s market:
- It allows appending metadata tags to objects which can be moved and stored across Amazon S3.
- It facilitates data security against unauthorized and third-parties users.
- It supports running processes like big data analytics, monitoring of data, and other irrelevant activity trends.
Now that you’re familiar with S3 and SFTP, let’s jump straight into SFTP S3 integration.
For further information on Amazon S3, you can check the official website here.
AWS Transfer for SFTP
AWS Transfer for SFTP is a fully managed SFTP service that allows you to create a new server and configure it with Amazon Simple Storage Service (Amazon S3) buckets. Moreover, it provides you with a great deal of control over user access and enhances your SFTP S# integration. You can also utilize your DNS name and existing SSH public keys, to easily migrate to Transfer for SFTP. This will allow your customers and partners to connect and transfer data as usual, without any changes to existing workflows.
Using AWS Transfer for SFTP to set up the SFTP S3 integration will also provide you access to various S3 buckets and their features such as lifecycle policies, multiple storage classes, versioning, etc. Furthermore, you can even write new AWS Lambda functions to create a “smart” FTP site that processes incoming files as they are uploaded. The FTP can also query files in place using Amazon Athena, and easily connects to your existing data import process. Apart from data processing, you can generate multiple reports, documents, custom software, etc. using other AWS services.
Download the Guide on Should you build or buy a data pipeline?
Explore the factors that drive the build vs buy decision for data pipelines
Creating an SFTP Server
The SFTP S3 integration will require you to set up an SFTP server. You can create an SFTP Server in AWS using the following steps:
Step 1: Generate a New Server
Go to AWS Transfer for SFTP Console and select “Create server”. Now using the default values, click on “Create server” to generate SFTP server for your user name. This is shown in the below image.
Step 2: Create User and Grant Acess
Once you have a server, you can add users to it! Choose your server and select the Add user option. Afterward, input the user name, select an S3 bucket for the user’s home directory, and provide the required access to that user account. Finally, create SSH public key using ssh-keygen, paste it, and click Add:
Step 3: Execute Commands
You can now retrieve the server endpoint from your console and execute your first sftp command as shown in the below image.
Step 4: Lambda Functions
You can also use a Lambda function by attaching it to a bucket to perform any sort of extra processing. For example, you can verify all uploaded images via the Amazon Rekognition tool and send them to different destinations based on the types of objects they contain.
Creating a Managed SFTP Service
The SFTP S3 integration will also need an SFTP service. You can build a fully Managed SFTP server for an S3 Bucket, using your Amazon AWS Console. Follow the below-mentioned steps to proceed with SFTP S3 integration.
- Step 1: First, navigate to the AWS Transfer for SFTP and generate a new server using the steps shown in the previous section.
- Step 2: Now provide permissions to the new user (created in the previous step) that can govern its access by an AWS role in IAM service.
- Step 3: To Ensure that this role has a relationship with transfer.amazonaws.com, go to a role page and click on the Trust relationships tab.
- Step 4: Choose the Edit trust relationship option, and navigate to the access control policy JSON document. in that, write transfer.amazonaws.com:1 in place of Statement[].Principal.Service. This is shown in the below code.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "transfer.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Connecting to Managed SFTP Server
The next step of SFTP S3 integration requires you to connect to the SFTP server. You can easily connect to the managed SFTP server created in the previous section just as you would connect to any other SFTP server.
So, you can get the server’s host name on the server page as an Endpoint in a format similar to the following:
server_id.server.transfer.region.amazonaws.com.
How Does Amazon S3 Transfers Files?
Amazon’s AWS S3 is a popular object storage service containing objects of sizes ranging from Kilobytes to Terabytes. AWS S3 is designed with the objective of importing and storing data of any size from all types of sources. It operates on a convenient web service interface and provides easy access via secure HTTPS protocol and a REST API. Moreover, you can use simple drag and drop functions to upload files into AWS S3.
S3 performs all interactions at the application level using RESTful APIs. It also has commands like PUT, GET, LIST, DELETE and COPY that allow you to interact with the storage bucket. This implies, to transfer data in the AWS console, you must use the built-in Amazon file transfer interface. However, trasfering files and data from Amazon S3 for further analysis can be a tedious task. This is where Hevo comes in.
Hevo, a No-code Data Pipeline, helps customers move all their data from Amazon S3 into their preferred Data Warehouse without having to write any code. With a fault-tolerant architecture and exceptional security, Hevo automates a lot of your data processing tasks. Moreover, S3 stores the files after compressing them into a Gzip format. Hevo’s Data pipeline automatically unzips any Gzipped files on ingestion and also performs file re-ingestion in case there is any data update.
So, what are you waiting for? Visit our website to explore more.
Steps to Set Up SFTP S3 Integration
The SFTP S3 integration requires you to set up a new IAM role and modify the Amazon S3 bucket policy to gain cross-account access. You can then leverage the SFTP server associated with your IAM role to establish a connection and start transferring your data to Amazon S3 buckets.
You can set up the SFTP S3 Integration, using the following steps:
Step1: Generate and Configure an S3 Bucket
To initiate your SFTP S3 Integration, log into your AWS account and click on the create bucket option present under the Buckets tab as shown in the below image.
Provide it with a suitable name and AWS region. Further, Define its access and encryption. Your S3 bucket will now be ready.
Step 2: Build an FTP Server
Navigate to the AWS console and generate a new FTP server using the AWS Transfer for SFTP (the step are given in the previous sections) and provide an SFTP protocol to it as shown below.
Step 3: Create User Accounts
This brings us to the last step of SFTP S3 integration. The permissions for user accounts will be enforced by default via the associated AWS role under the IAM service. However, you can also assign an identity provider using the API. Now, configure endpoints and choose S3 as your default storage.
Now, click on “Add User”, present under server as shown in the below image. This way you can add the users who can benefit from the SFTP S3 integration.
Fill in the required details and create a user account. Also, choose the S3 bucket as your Home Directory as shown below.
Finally, paste your SSH public key to complete the user account. You can now connect with an SFTP client, using the server ID and the private SSH key of the new user account.
That’s it! Your SFTP S3 Integration is ready.
Conclusion
This article teaches you how to set up the SFTP S3 Integration manually using AWS Transfer for SFTP. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently. Now, the manual approach of setting up the S3 REST API integration will add complex overheads in terms of time, and resources.
Such a solution will require skilled engineers and regular data updates. Furthermore, you will have to build an in-house solution from scratch if you wish to transfer your data from SFTP or S3 to a Data Warehouse for analysis.
Tell us about your experience of setting up the SFTP S3 Integration! Share your thoughts in the comments section below!
Aman Deep Sharma is a data enthusiast with a flair for writing. He holds a B.Tech degree in Information Technology, and his expertise lies in making data analysis approachable and valuable for everyone, from beginners to seasoned professionals. Aman finds joy in breaking down complex topics related to data engineering and integration to help data practitioners solve their day-to-day problems.