This article discusses a specific Data Engineering scenario in which data is moved from a Secure File Transfer Protocol (SFTP) or File Transfer Protocol (FTP) to Snowflake. Let’s examine the exact steps of this process and, in the end, provide a quick overview of the source and destination in this data flow.
This blog covers the first approach to moving data from FTP to Snowflake in detail. It also sheds light on the limitations of this approach so that you can take the path that suits your use case best.
Table of Contents
What Is FTP?
FTP stands for File Transfer Protocol, a basic standard protocol for transferring files over the Internet. It was created in the 1970s to push and pull large files between systems.
- Data Encryption: FTP alone does not encode the data being transferred, which implies both the data and login credentials can be captured.
- Port: FTP Use port 21
- Data Channels: FTP uses two channels:
- Command Channel — is used to send commands
- Data Channel — used for transferring data.
- Authentication: It needs a username and password to authenticate, but even the credentials are communicated in plain text.
What is Snowflake?
Snowflake is a fully managed SaaS that provides one platform for data warehousing, data lakes, data engineering, data science, and data application development while ensuring the secure sharing and consumption of real-time/shared data. It offers a cloud-based data storage and analytics service called data warehouse-as-a-service. Organizations can use it to store and analyze data using cloud-based hardware and software.
Key Features of Snowflake
- Data Governance and Compliance: Advanced security features incorporate end-to-end encryption, complying with regulations.
- Multi-cluster shared data architecture: It allows point-to-point scaling of computing resources independent of storage.
- Separate Storage and Compute: Optimizes cost performance with the ability to independently scale storage and compute.
- Data Security with Sharing: Enables data sharing in real-time without losing privacy and security.
Easily migrate your data from FTP to Snowflake and unlock powerful analytics capabilities! Seamlessly transfer your files for enhanced data management and faster insights, all while enjoying Snowflake’s scalable architecture. Try Hevo and equip your team to:
- Integrate data from 150+ sources(60+ free sources).
- Simplify data mapping with an intuitive, user-friendly interface.
- Instantly load and sync your transformed data into Snowflake.
Choose Hevo and see why Deliverr says- “The combination of Hevo and Snowflake has worked best for us. ”
Get Started with Hevo for FreeUnderstanding the Methods to Connect FTP to Snowflake
These are the methods you can implement to establish a connection from FTP/SFTP to Snowflake:
Method 1: Using Hevo Data to Connect FTP to Snowflake
Since Hevo is completely managed, your data projects can take off in just a few minutes. Here’s how simple it is to move data from FTP to Snowflake:
Step1: Configure the Source
Step 2: Integrate Data
For more information on the connectors required to connect SFTP to Snowflake, here are the links you can refer:
Method 2: Using Custom ETL to Connect FTP to Snowflake
Step 1: Downloading Data Files from FTP Server
As mentioned earlier, SFTP is a lot more secure than FTP in terms of data transfer security. Hence, this section is going to list the steps of establishing a connection to an SFTP server that has the data files and has them downloaded to the local machine.
- Connecting to the SFTP server
Open up the command prompt and use the following command syntax at the prompt –sftp username@*serverip*
Upon successful execution of that command, the prompt will likely ask for the username’s password. Once that is provided and authenticated, the prompt will change to SFTP, indicating a successful connection established with the server.
- Downloading the data files
Once you are in the SFTP server, you need to navigate to the directory that has the data files. To understand the current directory location, you can use the command pwd (stands for present working directory), and to navigate to a different folder/directory, the cd (stands for change directory) command (cd *directory_path*) can be used. Similarly, to view the local machine’s current path and change the local machine directory, the commands lpwd and lcd can be used, respectively. After getting to the required local directory, the following get command will download the file onto the local machine –get *filename* ex: get test.csv
or in case of multiple files to be downloaded together,
mget *filenamepattern* ex: mget test0*.csv
The steps to download files from an FTP server are similar to those listed above.
Step 2: Uploading Data Files From Client
Once the data files are available on the local machine, the PUT command is used to upload the files to a Snowflake stage. Here’s the syntax for the PUT command
PUT file://<file_path>/<file_name> internalstage
File_path: The local directory folder path to the file. For example, a Windows OS path would look something like this:
C:dbsnowflakeload
File_name: The file name has to be mentioned as the last string to the file path or Snowflake might try to upload all the files in that folder path.
Internal stage: This is the first internal Snowflake location (stage) where the uploaded data files reside on the successful execution of the PUT command. To create an internal stage, the following syntax can be used
CREATE STAGE IF NOT EXISTS internal_stage
The above command creates an internal stage called internal_stage. An internal stage should be defined with a file format to identify the type of data files it will be storing. This can be explicitly defined, or if undefined (as in the example above), it will be set to CSV by default.
An internal stage can then be defined using one of the following three options:
- Named internal stage
- The internal stage for a specific table
- The internal stage for the current user
The first option is commonly used, and the syntax for that looks like –
@[namespace.]<internal_stg_name>[/<path>]
namespace – an optional parameter indicating the database/schema in which the stage resides,
path – this is also an optional parameter. This is typically set to point to a specific file, if however, it points to a folder with multiple data files, during the upload, all of them get uploaded.
Some optional parameters can be used with the PUT command. Here are a couple of examples –
-
AUTO_COMPRESS = TRUE | FALSE
This option can be set to TRUE if you want Snowflake to gzip compress the uploaded data file (if the file is not already in that format).
-
OVERWRITE = TRUE | FALSE
This can be set to TRUE to overwrite an existing file with the same file name currently being uploaded. Also, this option is quite helpful in forcing an overwrite if the data file in the stage is corrupted or has insufficient data.
Step 3: Downloading Staged Data File to a Snowflake Data Warehouse
The command to download the staged files onto Snowflake is called ‘copy.’ There are a couple of ways to use the copy command depending on how many files are to be copied and how they are named:
- Files
If there are only a few files to be copied and if it is easy to name them separately as a list, the following form of copy command can be utilized.copy into mynewtable from @%internal_stage files=('abc1.csv', 'xyz2.csv');
- Pattern
In case there are more files, but they follow a certain pattern regarding their naming, then the following regex-based pattern matching copy command can be used.copy into mynewtable from @%internal_stage pattern='abc[^0-9].csv';
Snowflake manages a metadata table that stores the status of all the attempted data file copies. Hence, it is easy for Snowflake to realize a duplicate data load and prevent it from happening.
Step 4: Data Transformation and Automatic Data Loading
Snowflake provides options to do some fundamental data transformations during the data loading phase, i.e., while data is moved from the internal stage to the Snowflake table. This comes in handy as this eliminates the need to temporarily store the data in a data warehouse stage, apply transformations to that, and then load the data into the final table. The following are some of the data transformation options provided by Snowflake
- Column Reordering
Here is an example of a COPY command using a select statement to reorder the columns of a data file before going ahead with the actual data loadcopy into abc_test(ID, type, area, cost) from (select a.$1, a.$4, a.$2, a.$3 from @internal_stage a) file_format = (format_name = csvtest);
In the above example, the order of columns in the staged data is different from that of the abc_test table, hence, the select statement ensures columns are appropriately mapped using the $*number* syntax to match with the order of abc_test.
- Column Omissions
copy into abc_test(ID, type, area, cost) from (select b.$1, b.$4, b.$6, b.$8 from @internal_stage b) file_format = (format_name = csv1);
In the above example, only a subset of the columns on the data file is copied. This subset is chosen (via $*number* syntax) based on the fields in the destination table and the order of those fields in the staged data.
- Auto Increment Column
Let’s say that the abc_test table has been created with the first column (ID) as an auto-increment columnThe following copy command will copy over the data from the data file while generating number sequences automatically for the ID column
copy into abc_test (type, cost) from (select $4, $8 from @internal_stage);
As you can see, the ID column is not mentioned in the above copy into the statement, however, while the records are inserted in the table, ID number sequences are automatically generated.
Snowpipe is a utility Snowflake provides to automatically detect and ingest staged files as and when they become available. Using this feature, you don’t have to run multiple copy commands manually to keep your data up-to-date.
Here’s the summary of the steps to set up Snowpipe for continuous data loading
- Define a new pipe object with the required COPY INTO statement – this copy statement should be defined with the appropriate source, select statement, and destination locations, as provided in the examples before.
- Through your client application, call/hit the public REST endpoints/APIs provided by Snowflake with a list of filenames and the previously created pipe reference.
- If the above filenames are found in the stage location, they are queued (ingest queue) to be loaded into the destination obtained from the pipe.
Limitations of Using Custom ETL to Connect FTP to Snowflake
- Undertaking and managing this data engineering task in-house, via the steps mentioned in the custom ETL method, reliably and robustly, may prove to be more challenging than it appears on paper. Given you are custom-building ETL from scratch, this would come at the cost of time.
- The setup is brittle since each part of the infrastructure is assembled manually. Any minor change at the source or destination can break the data flow, leading to irretrievable data loss.
- In addition to this, the data cannot be loaded in real time using the above method. You would need to take up additional cron jobs that can run this setup at a set frequency.
- Since you are moving critical data into the warehouse, you will need to proactively monitor the infrastructure and data loaded into Snowflake to ensure no inconsistency. You must set up stable notification and alert systems to ensure you are on top of this project.
- Implementing a solution that takes care of these hassles seamlessly, that lets you shift your focus onto generating insights from the data on Snowflake might be the decision you’d want to make – and this brings us back to Hevo.
Conclusion
This blog talks about the two methods you can use to set up a connection from FTP to Snowflake: using custom ETL scripts to load data manually or with the help of Hevo’s no-code data pipeline.
Extracting complex data from a diverse set of data sources can be a challenging task, and this is where Hevo saves the day! Hevo seamlessly integrates with FTP and Snowflake ensuring that you see no delay in terms of setup and implementation. It helps you directly transfer data from FTP or any other source of your choice to a Data Warehouse, Business Intelligence tools, or any other desired destination in a fully automated and secure manner without having to write any code and will provide you with a hassle-free experience. Sign up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.
FAQ FTP to Snowflake
Does Snowflake Support SFTP?
Snowflake does not natively support direct SFTP (Secure File Transfer Protocol) as a built-in feature for data loading. However, you can use external tools and services to transfer files via SFTP to a staging area, such as an Amazon S3 bucket or an Azure Blob Storage, and then load the data into Snowflake from there.
How to Transfer Data Using FTP?
To transfer data using FTP (File Transfer Protocol), follow these general steps:
1. Connect to FTP Server
2. Navigate to Desired Directory
3. Upload or Download Files
4. Verify Transfer
5. Disconnect
How to Load Data into Snowflake from a Local File?
1. Prepare the File
2. Stage the File
3. Load the Data
4. Verify the Load
5. Clean Up