FTP S3 Integration: 2 Easy Methods

on Data Integration, Data Warehouse, ETL, Tutorials • January 31st, 2021 • Write for Hevo

An array of beneficial features can be harnessed with Amazon S3 transfers made using FTP. The FTP S3 combination enhances data availability and improves access without any system limitations. This article deals with different aspects of FTP and Amazon S3, in addition to a detailed guide describing how to transfer data to S3 using FTP and alternate solutions.

This article aims at providing you with a step-by-step guide to help you set up the FTP S3 integration with ease to help you transfer your data to Amazon S3 for a fruitful analysis securely. Upon a complete walkthrough of the content, you’ll be able to connect the FTP server to Amazon S3 easily. It will further help you build a customized ETL pipeline for your organization. Through this article, you will get a deep understanding of the tools and techniques & thus, it will help you hone your skills further.

Table of Contents

Introduction to FTP [File Transfer Protocol]

FTP working principle - FTP S3
FTP Working Mechanism

FTP is a robust and fast network protocol that allows large file transfers based on a client-server model. Features like block storage, Storage Area Network and NAS, etc can be used to optimize file transfer and storage strategies. This protocol when used with Amazon S3 can be of great significance to improve the system performance.

More information regarding the FTP [File Transfer Protocol] can be found here.

Introduction to Amazon S3

Amazon Logo - FTP S3

Amazon’s Simple Storage Service renders a range of comprehensive and simple to implement services over the Internet. With a set of dedicated features for the storage, movement, security, configuration and analytics of data, it is a popular choice among different businesses. 

Creating an S3 FTP file transfer mechanism can present several significant benefits. The file storage can be minimized while also increasing the planning administration capacity. Using S3 FTP object storage as a file system you can carry out interactions at the application level but directly mounting S3 is not possible. 

More information regarding Amazon S3 can be found here.

2 Methods to Implement AWS S3 FTP Integration

Method 1: FTP S3 Integration using Manual Method

In this method, you will need to manually implement your FTP S3 Integration using S3FS. S3FS – Fuse is a FUSE-based file system that facilitates fully functional file systems, an S3 bucket can be directly mounted as a local filesystem. Your connection would be able to facilitate the upload and synchronize using FTP to the configured Amazon S3 bucket.

Method 2: FTP S3 Integration using Hevo’s No-code Data Pipeline

Get Started with Hevo for Free

A fully managed, No-code Data Pipeline platform like Hevo Data, helps you load data from SFTP/FTP (among 100+ Sources) to Amazon S3 in real-time, in an effortless manner. Hevo, with its minimal learning curve, can be set up in a matter of minutes making the users ready to load data without compromising performance. Its strong integration with various sources such as databases, files, analytics engines, etc. gives users the flexibility to bring in data of all different kinds in a way that’s as smooth as possible, without having to write a single line of code.

Simplify your Data Analysis with Hevo today! 

Sign up here for a 14-Day Free Trial!

Pre-Requisites

  • Working knowledge of Amazon S3.
  • Working knowledge of FTP [File Transfer Protocol].
  • A general idea about the Amazon Web Services environment.
  • An Amazon S3 account and bucket.

How does Amazon S3 Transfer files?

The finest example of an object storage service is Amazon’s AWS S3 (Amazon Simple Storage Service). A single object in S3 might be anywhere from a few Kilobytes to a few Terabytes in size. Objects are categorized and stored in AWS S3 using a “buckets” structure.

S3 is a simple web services interface that allows you to store and retrieve any amount of data from anywhere at any time. The secure web-based protocol HTTPS and a REST Application Programming Interface (API), also known as RESTful API, are used to access S3.

With S3’s simple drag-and-drop tool, you may upload files, directories, or data.

FTP S3 - Transfer Files
https://www.pcwdld.com/wp-content/uploads/image17-e1625070518305.pngImage Source

Storage Units: FTP vs S3 Buckets

FTP and SFTP were created for transferring files, while Amazon S3 buckets were created for storing things. Although both are used for remotely sharing and storing data, they operate in different ways.

Files are transactional-based units for file-based storage. FTP and SFTP (SSH File Transfer Protocol) are file transfer protocols that access data in storage at the file level. They have access to files that are organized in a hierarchy of folders and files. Objects are another form of storage unit that includes the data, its associated (expandable and customizable) metadata, and a graphical user interface (GUI) (Globally Unique Identifier). A file, a subfile, or a collection of unconnected bits and bytes are all examples of objects. Objects are used in Amazon S3 buckets.

FTP S3: Storage
Image Source

Blocks are a different sort of file format that is utilized in structured Database Storage and Virtual Machine File System (VMFS) volumes. The scope of this guide does not include blocks.
When it comes to storing large volumes of unstructured data, object storage surpasses file storage. Things are easy to access and deliver high stream throughput regardless of the amount of storage.

Cloud-based object storage, such as Amazon S3 buckets, scales significantly better than SFTP or FTP server storage in general.

2 Methods to Implement AWS S3 FTP Integration

Method 1: FTP S3 Integration using Manual Method

S3FS – Fuse is a FUSE-based file system that facilitates fully functional file systems, an S3 bucket can be directly mounted as a local filesystem. Read and write access is made available for the same along with fundamental file management commands for manipulation.

More information regarding S3FS can be found here.

Procedure for FTP S3 Integration using Manual Method

To install and setup S3FTP and carry out S3 transfer using FTP, you can implement the following steps:

Step 1: Create an S3 Bucket

The first thing to get started is to create an S3 bucket using the AWS console. This will serve as the final location for all files transferred through FTP.

Step A: Create a Bucket

The first thing to get started is to create an S3 bucket.

FTP S3: Create Bucket Step A
Image Source

Step B: Configure your Bucket

Set up your bucket. Give it a name and choose the AWS region, as well as access, versioning, and encryption options. After that, your S3 bucket should be ready to use.

FTP S3: Create Bucket Step B
Image Source

Step C: Create a New Server

Open AWS Transfer for SFTP and Create a New Server in your AWS interface.

FTP S3: Create Bucket Step C
Image Source

Step D: Choose a Protocol

Choose the protocol when configuring your new server (SFTP, FTPS, or FTP).

FTP S3: Create Bucket Step D
Image Source

Step 2: Create an IAM Policy and Role for S3 Bucket

Proceed to create an IAM [Identity and Access Management] Policy and Role that can facilitate read/write access to all previous S3 buckets. This can be done by running the following command:

aws iam create-policy 
--policy-name S3FS-Policy 
--policy-document file://s3fs-policy.json

Replace the bucket name “ca-s3fs-bucket” with the S3 bucket that you will be using within your own environment as shown below:

{
   "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:ListBucket"],
            "Resource": ["arn:aws:s3:::ca-s3fs-bucket"]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": ["arn:aws:s3:::ca-s3fs-bucket/*"]
        }
    ]
}

Finally, you need to create the S3FS role and attach the S3FS policy using the AWS console. This can be created using the AWS IAM console.

Step 3: Proceed to Launch FTP Server

In this step, you need to launch an EC2 instance on Amazon Linux 2 for hosting the FTP service. 

Run the following command to launch the instance with the attached S3FS role:

aws ec2 run-instances 
--image-id ami-0d1000aff9a9bad89 
--count 1 
--instance-type t3.micro 
--iam-instance-profile Name=S3FS-Role 
--key-name EC2-KEYNAME-HERE 
--security-group-ids SG-ID-HERE 
--subnet-id SUBNET-ID-HERE 
--associate-public-ip-address 
--region us-west-2 
--tag-specifications 
'ResourceType=instance,Tags=[{Key=Name,Value=s3fs-instance}]' 
'ResourceType=volume,Tags=[{Key=Name,Value=s3fs-volume}]'

Within this command, you will use the –associate-public-ip-address parameter to temporarily assign a public IP address to the instance. A production-based environment would use an EIP address and facilitate the launch of the attached S3FS role.

Step 4: Installation and Building of S3FS

Install S3 FTP using the s3ftp.install.sh script. Update local OS packages and install and additional packages required for the build and compilation as shown:

sudo yum -y update && 
sudo yum -y install 
jq 
automake 
openssl-devel 
git 
gcc 
libstdc++-devel 
gcc-c++ 
fuse 
fuse-devel 
curl-devel 
libxml2-devel

Use the following code to ensure the complete and correct installation of S3FS:

git clone https://github.com/s3fs-fuse/s3fs-fuse.git
cd s3fs-fuse/
 
./autogen.sh
./configure
 
make
sudo make install
 
which s3fs
s3fs --help

Step 5: User Account and Home Directory Configuration

Create a user account for the authentication of the FTP service by using the following command:

sudo adduser ftpuser1
sudo passwd ftpuser1

Consequently, create a home directory that will be configured for use with the created user account. You can use the following command to create the same:

sudo mkdir /home/ftpuser1/ftp
sudo chown nfsnobody:nfsnobody /home/ftpuser1/ftp
sudo chmod a-w /home/ftpuser1/ftp
sudo mkdir /home/ftpuser1/ftp/files
sudo chown ftpuser1:ftpuser1 /home/ftpuser1/ftp/files

Step 6: Installation and Configuration of FTP 

The next stage involves the installation and configuration of the actual FTP service. Install the vsftpd package and backup the default configuration as shown:

sudo yum -y install vsftpd
 
sudo mv /etc/vsftpd/vsftpd.conf /etc/vsftpd/vsftpd.conf.bak

Regenerate the configuration file by running the following script:

sudo -s
EC2_PUBLIC_IP=`curl -s ifconfig.co`
cat > /etc/vsftpd/vsftpd.conf << EOF
anonymous_enable=NO
local_enable=YES
write_enable=YES
local_umask=022
dirmessage_enable=YES
xferlog_enable=YES
connect_from_port_20=YES
xferlog_std_format=YES
chroot_local_user=YES
listen=YES
pam_service_name=vsftpd
tcp_wrappers=YES
user_sub_token=$USER
local_root=/home/$USER/ftp
pasv_min_port=40000
pasv_max_port=50000
pasv_address=$EC2_PUBLIC_IP
userlist_file=/etc/vsftpd.userlist
userlist_enable=YES
userlist_deny=NO
EOF
exit

The following set of vsftp configuration and properties can be observed:

sudo cat /etc/vsftpd/vsftpd.conf
 
anonymous_enable=NO
local_enable=YES
write_enable=YES
local_umask=022
dirmessage_enable=YES
xferlog_enable=YES
connect_from_port_20=YES
xferlog_std_format=YES
chroot_local_user=YES
listen=YES
pam_service_name=vsftpd
userlist_enable=YES
tcp_wrappers=YES
user_sub_token=$USER
local_root=/home/$USER/ftp
pasv_min_port=40000
pasv_max_port=50000
pasv_address=X.X.X.X
userlist_file=/etc/vsftpd.userlist
userlist_deny=NO

Add the user account to the vsftpd file by implementing echo "ftpuser1" | sudo tee -a /etc/vsftpd.userlist . You can now initiate the FTP service by running sudo systemctl start vsftpd.

Check the status of the FTP service and verify the startup by running sudo systemctl status vsftpd.

The status can be observed as follows:

vsftpd.service - Vsftpd ftp daemon
Loaded: loaded (/usr/lib/systemd/system/vsftpd.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2019-08-13 22:52:06 UTC; 29min ago
Process: 22076 ExecStart=/usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf (code=exited, status=0/SUCCESS)
Main PID: 22077 (vsftpd)
CGroup: /system.slice/vsftpd.service
└─22077 /usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf

Step 7: Run a Test with FTP Client

Finally, run a test with the FTP client to test the FTP service. Start by authentication of the FTP user account using the following script:

ftp 18.236.230.74
Connected to 18.236.230.74.
220 (vsFTPd 3.0.2)
Name (18.236.230.74): ftpuser1
331 Please specify the password.
Password:
230 Login successful.
ftp>

After successful authentication, switch to passive mode before performing the FTP upload and carry out the test as shown:

ftp> passive
Passive mode on.
ftp> cd files
250 Directory successfully changed.
ftp> put mp3data
227 Entering Passive Mode (18,236,230,74,173,131).
150 Ok to send data.
226 Transfer complete.
131968 bytes sent in 0.614 seconds (210 kbytes/s)
ftp>
ftp> ls -la
227 Entering Passive Mode (18,236,230,74,181,149).
150 Here comes the directory listing.
drwxrwxrwx    1 0 0             0 Jan 01 1970 .
dr-xr-xr-x    3 65534 65534          19 Oct 25 20:17 ..
-rw-r--r--    1 1001 1001       131968 Oct 25 21:59 mp3data
226 Directory send OK.
ftp>

Terminate the FTP session after removing the remote files using ftp> del mp3dataftp> quit to now proceed with the configuration of the S3FS mount.

Step 8: Initiate S3FS and Mount Directory

Implement the following script which involves bucket specifications and security specifications to initiate and launch S3FS:

EC2METALATEST=http://169.254.169.254/latest && 
EC2METAURL=$EC2METALATEST/meta-data/iam/security-credentials/ && 
EC2ROLE=`curl -s $EC2METAURL` && 
S3BUCKETNAME=ca-s3fs-bucket && 
DOC=`curl -s $EC2METALATEST/dynamic/instance-identity/document` && 
REGION=`jq -r .region <<< $DOC`
echo "EC2ROLE: $EC2ROLE"
echo "REGION: $REGION"
 
sudo /usr/local/bin/s3fs $S3BUCKETNAME 
-o use_cache=/tmp,iam_role="$EC2ROLE",allow_other /home/ftpuser1/ftp/files 
-o url="https://s3-$REGION.amazonaws.com" 
-o nonempty

Now proceed with the debug process of the S3FS FUSE mounting with the following script if required:

sudo /usr/local/bin/s3fs ca-s3fs-bucket 
-o use_cache=/tmp,iam_role="$EC2ROLE",allow_other /home/ftpuser1/ftp/files 
-o dbglevel=info -f 
-o curldbg 
-o url="https://s3-$REGION.amazonaws.com" 
-o nonempty

Check the status of the S3FS process and ensure that it’s running as required. This can be done as shown:

ps -ef | grep  s3fs
 
root 12740 1  0 20:43 ? 00:00:00 /usr/local/bin/s3fs 
ca-s3fs-bucket -o use_cache=/tmp,iam_role=S3FS-Role,allow_other 
/home/ftpuser1/ftp/files -o url=https://s3-us-west-2.amazonaws.com

Once verified, your system is good to go for trying out an end to end transfer. 

Step 9: Run S3 FTPS to Perform File Transfer 

Setup FTPS and proceed to carry out your file transfer. You can test the connection using clients like FileZilla. Once corroborated, you can check the configuration of your AWS S3 web console. Initiate FTP transfers from your user directory automatically to the configured Amazon S3 bucket. Thus, your connection can facilitate the upload and synchronization using FTP to Amazon S3.

Limitations of FTP S3 Integration using Manual Method

  • Requires significant time investment and technical knowledge to implement.
  • Potential errors with manual implementation could lead to data loss.
  • Lack of an update process could lead to incompatibility in long run.

Method 2: FTP S3 Integration using Hevo’s No-code Data Pipeline

Hevo Feature Image - FTP S3

Hevo Data, a No-code Data Pipeline, helps you transfer data from FTP (among 100+ sources) to Amazon S3 & lets you visualize it using a BI tool. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. It allows you to focus on crucial business needs and perform insightful analysis using various BI tools such as Power BI, Tableau, etc. 

Steps to use Hevo Data

Hevo Data focuses on two simple steps to get you started:

  • Configure Source: Connect Hevo Data with your FTP server by providing a unique name for your pipeline along with information about the type of server you want to with. You will further need to provide your credentials such as username and password to allow Hevo access to it, along with details about the associated port and host values. You can also specify the type of file you’ll want to transfer, choosing between XML, JSON, and CSV. In case you want to connect using SSH, you can also enable the option for the same.
  • Integrate Data: Load data from SFTP to S3 by providing a unique name for your destination along with information about your Amazon S3 bucket, its name, region, your AWS access key ID, secret access key, and the file format in which you want to write your data. You can either choose JSON or ORC.

Check out what makes Hevo amazing

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a safe & consistent manner with zero data loss.
  • Minimal Learning: Hevo, with its interactive UI, is simple for new customers to work on and perform operations.
  • Live Monitoring: Hevo allows you to monitor the data flow, so you can check where your data is at a particular point in time.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to export. 
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Completely Managed Platform: Hevo is fully managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.

FTP/SFTP Clients for S3

Using an FTP/SFTP Client that supports Amazon S3 is the final best approach to FTP/SFTP to Amazon S3. FileZilla, WinSCP, and Cyberduck are some instances of these clients. You will not need to configure anything on the server if the FTP client is also an S3 client.

Using FileZilla, an open-source platform

Use FileZilla, which is a free, open-source, and cross-platform FTP server/client. FTP, SFTP, and FTPS are supported by the FileZilla client, although only FTP and FTPS are supported by the server. With the Pro edition, FileZilla additionally supports AWS S3 (among other cloud storage), making it easy to FTP to Amazon S3.

You can set the host and utilize S3 as a protocol when configuring the FileZilla client.

FTP S3: Clients
Image Source
  • It’s important to remember that you’re configuring a client, not a server. As a result, you’ll be able to view but not alter the data.
  • Outside of public networks, FTP is not advised for file transfers. For security concerns, you’ll need to set up an FTP server for S3 inside a VPC (Virtual Private Cloud) or through a VPN. If you want to access files over the Internet, use FTPS (for FileZilla).

Conclusion 

This article teaches you how to set up the FTP S3 integration with ease. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently. These methods, however, can be challenging especially for a beginner & this is where Hevo saves the day. 

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline, helps you transfer data from a source of your choice in a fully automated and secure manner without having to write the code repeatedly. Hevo, with its strong integration with 100+ sources & BI tools, allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiff.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

Share your experience of learning about the FTP S3 Integration! Let us know in the comments section below!

No-code Data Pipeline For Amazon S3