This article talks about a specific Data Engineering scenario where data gets moved from a Secure File Transfer Protocol/File Transfer Protocol i.e. SFTP or FTP to Snowflake. Let’s get into the exact steps of this process, and at the end, a quick overview of the source and destination in this data flow.

This blog covers the first approach to moving data from FTP to Snowflake in detail. It also sheds light on the limitations of this approach so that you can take the path that suits your use case best.

Understanding the Methods to Connect FTP to Snowflake

These are the methods you can implement to establish a connection from FTP/SFTP to snowflake:

Method 1: Using Hevo Data to Connect FTP to Snowflake

Hevo is the only real-time ELT No-code Data Pipeline to directly transfer data from FTP to Snowflake platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

GET STARTED WITH HEVO FOR FREE

Since Hevo is completely managed, your data projects can take off in just a few mins. Here’s how simple it is to move data from FTP to Snowflake:

  • Configure Source: Connect Hevo Data with your FTP Server by providing a unique name for your pipeline along with information about the type of server you want to with. You will further need to provide your credentials such as username and password to allow Hevo access to it, along with details about the associated port and host values. You can also specify the type of file you’ll want to transfer, choosing between XML, JSON, and CSV. In case you want to connect using SSH, you can also enable the option for the same.
FTP to Snowflake: Configuring Source
Configure SFTP/FTP to Hevo Data
  • Integrate Data: Complete FTP to Snowflake migration by providing your destination name, account name, region of your account, database username and password, database and schema name, and the Data Warehouse name.
FTP to Snowflake: Configuring Destination
Configure Snowflake Destination

For more information on the connectors required to connect to SFTP to Snowflake, here are the links you can refer:

Here are more reasons to love Hevo:

  • Schema ManagementHevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Data Transformation: Hevo’s Pre and Post Load Transformations help you prepare your data for analysis.
  • Easy Setup and Implementation – Hevo is a self-serve, managed data integration platform. You can cut down your project timelines drastically as Hevo can help you move data from SFTP/FTP to Snowflake in minutes.
  • 150+ Pre-built integrations – In addition to SFTP/FTP, Hevo can bring data from 150+ other data sources into Snowflake in real-time. This will ensure that Hevo is the perfect companion for your business’s growing data integration needs.
  • Complete Monitoring and Management – In case the FTP server or Snowflake data warehouse is not reachable, Hevo will re-attempt data loads in a set instance ensuring that you always have accurate, up-to-date data in Snowflake.

We can now generate over 100 reports daily, Thanks to Hevo. This is a 5X growth from before. Business teams are wowed by the speed and accuracy at which we operate!

– Swati Singhi, Lead Engineer, Curefit

No ETL Scripts or Cron Jobs. Hevo will now take care of delivering data in a secure and reliable fashion to your Snowflake Data Warehouse. Try our 14-day full-feature access free trial!

Get Started with Hevo for Free

Method 2: Using Custom ETL to Connect FTP to Snowflake

The broad steps to this approach include:

Step 1: Downloading Data Files from FTP Server

As touched on earlier, in terms of data transfer security, SFTP is a lot more secure than FTP. Hence, this section is going to list the steps of establishing a connection to an SFTP server that has the data files and has them downloaded to the local machine.

  • Connecting to the SFTP server
    Open up the command prompt and use the following command syntax at the prompt –
    sftp username@*serverip*

    On successful execution of that command, the prompt will likely ask for the password of that username. Once that is provided and authenticated, the prompt will change to SFTP, indicating a successful connection established with the server.

  • Downloading the data files
    Once you are in the SFTP server, you need to navigate to the directory that has the data files. To understand the current directory location you can use the command pwd (stands for present working directory), and to navigate to a different folder/directory, the cd (stands for change directory) command (cd *directory_path*) can be used. Similarly, to view the current path on the local machine and to change the local machine directory, the commands lpwd and lcd can be used, respectively. After getting to the required local directory, the following get command will download the file onto the local machine –
    get *filename*  ex:  get test.csv

    or in case of multiple files to be downloaded together,

    mget *filenamepattern*  ex:  mget test0*.csv

The steps to download files from an FTP server are quite similar to the steps listed above.

Step 2: Uploading Data Files from Client

Once the data files are available in the local machine, the PUT command is used to upload the files onto a Snowflake stage. Here’s the syntax for the PUT command

PUT file://<file_path>/<file_name> internalstage

File_path: The local directory folder path to the file. For example, a Windows OS path would look something like this:

C:dbsnowflakeload

File_name: The file name has to be mentioned as the last string to the file path or Snowflake might try to upload all the files in that folder path.

Internal stage: This is the first internal Snowflake location (stage) where the uploaded data files reside on the successful execution of the PUT command. To create an internal stage, the following syntax can be used

CREATE STAGE IF NOT EXISTS internal_stage

The above command creates an internal stage called internal_stage. An internal stage should be defined with a file format, to identify the type of data files it will be storing. This can be explicitly defined, or if undefined (as in the example above), it will be set to CSV by default.

An internal stage can then be defined using one of the following three options:

  • Named internal stage
  • The internal stage for a specific table
  • The internal stage for the current user  

The first option is commonly used and the syntax for that looks like – 

@[namespace.]<internal_stg_name>[/<path>]

namespace – an optional parameter indicating the database/schema in which the stage resides,
path – this is also an optional parameter. This is typically set to point to a specific file, if however, it points to a folder with multiple data files, during the upload, all of them get uploaded. 

Some optional parameters can be used with the PUT command. Here are a couple of examples – 

  • AUTO_COMPRESS = TRUE | FALSE

This option can be set to TRUE if you want Snowflake to gzip compress the data file being uploaded (if the file is not already in that format).

  • OVERWRITE = TRUE | FALSE

This can be set to TRUE to overwrite an existing file with the same file name that is currently being uploaded. Also, this option is quite useful to force an overwrite in case the data file in the stage is corrupted or has bad data.  

Step 3: Downloading Staged Data File to a Snowflake Data Warehouse

The command used to download the staged files onto Snowflake is called ‘copy’. There are a couple of ways to use the copy command depending on how many files are to be copied and how they are named:

  • Files
    If there are only a few files to be copied and if it is easy to just name them separately as a list, the following form of copy command can be utilized

     

    copy into mynewtable from @%internal_stage files=('abc1.csv', 'xyz2.csv');
  • Pattern
    In case there are more files but they follow a certain pattern in regards to their naming, then the following regex-based pattern matching copy command can be used

     

    copy into mynewtable from @%internal_stage pattern='abc[^0-9].csv';

    Snowflake manages a metadata table that stores the status of all the data file copies that were attempted. Hence, it is easy for Snowflake to realize a duplicate data load and prevent it from happening.

Step 4: Data Transformation and Automatic Data Loading

Snowflake provides options to do some basic data transformations during the data loading phase i.e. while data is moved from the internal stage to the Snowflake table. This comes in handy as this eliminates the need for storing the data temporarily in a stage in the data warehouse, applying transformations on that, and then loading the data into the final table. The following are some of the data transformation options provided by Snowflake

  • Column Reordering
    Here is an example of a COPY command using a select statement to reorder the columns of a data file before going ahead with the actual data load
    copy into abc_test(ID, type, area, cost) from (select a.$1, a.$4, a.$2, a.$3  from @internal_stage a) file_format = (format_name = csvtest);

    In the above example, the order of columns in the staged data is different from that of the abc_test table, hence, the select statement ensures columns are appropriately mapped using the $*number* syntax to match with the order of abc_test.

  • Column Omissions

     

    copy into abc_test(ID, type, area, cost) from (select b.$1, b.$4, b.$6, b.$8  from @internal_stage b) file_format = (format_name = csv1);

    In the above example, only a subset of the columns available on the data file is copied. This subset is chosen (via $*number* syntax) based on the fields in the destination table and the order of those fields in the staged data

  • Auto Increment Column
    Let’s say that the abc_test table has been created with the first column (ID) as an auto-increment column

     

    id number autoincrement start 0 increment 1

    The following copy command will copy over the data from the data file while generating number sequences automatically for the ID column

    copy into abc_test (type, cost)  from (select $4, $8 from @internal_stage);

    As you can see, the ID column is not mentioned in the above copy into the statement, however, while the records are inserted in the table, ID number sequences are automatically generated.

Snowpipe is a utility provided by Snowflake to automatically detect and ingest staged files as and when they become available. Using this feature, you don’t have to run multiple copy commands manually to keep your data up-to-date.

Here’s the summary of the steps to set up Snowpipe for continuous data loading

  • Define a new pipe object with the required COPY INTO statement – this copy statement should be defined with the appropriate source, select statement, and destination locations, as provided in the examples before.
  • Through your client application, call/hit the public REST endpoints/APIs provided by Snowflake with a list of filenames and the previously created pipe reference.
  • If the above set of filenames is found in the stage location, they are queued (ingest queue) to be loaded into the destination obtained from the pipe.

Limitations of Using Custom ETL to Connect FTP to Snowflake

  • Undertaking and managing this data engineering task in-house, via the steps mentioned in the custom ETL method, reliably and robustly, may prove to be more challenging than it appears on paper. Given you are custom building ETL from scratch, this would come at the cost of time.
  • Since each part of the infrastructure is assembled manually, the setup is brittle. Any minor change at the source or destinations can break the flow of data leading to irretrievable loss of data. 
  • In addition to this, the data cannot be loaded in real-time using the above method. This would need you to take up additional cron jobs that can run this setup at a set frequency. 
  • Since you are moving critical data into the warehouse, you will need to proactively monitor both the infrastructure and data loaded into Snowflake to ensure that there is no inconsistency. You will need to set up stable notification and alert systems to ensure that you are on top of this project. 
  • Implementing a solution that takes care of these hassles seamlessly, that lets you shift your focus onto generating insights from the data on Snowflake might be the decision you’d want to make – and this brings us back to Hevo. 

Before we wrap up, let’s cover some basics.

Introduction to SFTP/FTP

SFTP/FTP Snowflake: Jira Logo
Jira Logo

Secure File Transfer Protocol/File Transfer Protocol (SFTP/FTP) are network protocols that facilitate data transfer between client and server systems. The basic difference between the two protocols is the added security layer in SFTP that establishes a secure connection between the client and the server based on an authentication method (such as username/password). 

Introduction to Snowflake

SFTP/FTP to Snowflake: Snowflake Logo
Image Source

Snowflake is a popular Cloud Data Warehouse, that is primarily relational in nature, which also supports semi-structured data formats like ORC, JSON, XML, etc. Data can be ingested into Snowflake from a variety of sources including popular Cloud storage like Amazon S3, GCS, etc. In this blog, however, we are focusing our attention on data transfer from FTP to Snowflake.

The following section describes the steps to be followed to develop a custom ETL solution to achieve the migration from FTP to Snowflake. However, in case you decide to opt for a third-party solution that can help you with creating and managing this data pipeline, Hevo Data is a Cloud ETL platform that offers such a service, along with many more Data Engineering pipeline management services.

Conclusion

This blog talks about the two methods you can use to set up a connection from FTP to Snowflake: using custom ETL scripts to load data manually or with the help of Hevo’s no-code data pipeline.

Extracting complex data from a diverse set of data sources can be a challenging task and this is where Hevo saves the day! Hevo seamlessly integrates with FTP and Snowflake ensuring that you see no delay in terms of setup and implementation. It helps you directly transfer data from FTP or any other source of your choice to a Data Warehouse, Business Intelligence tools, or any other desired destination in a fully automated and secure manner without having to write any code and will provide you with a hassle-free experience.

Learn more about Hevo[/hevoButton]

Want to take Hevo for a spin? Signup for a 14-day free trial and experience the feature-rich Hevo suite first hand.

What are your thoughts about moving data from FTP to Snowflake? Let us know in the comments. 

mm
Freelance Technical Content Writer, Hevo Data

Avinash specializes in freelance writing within the data industry, delivering informative and engaging content covering data analytics, machine learning, AI, big data, and business intelligence.

No-code Data Pipeline For Snowflake

Get Started with Hevo