Is your data in Google Sheets becoming too large for on-demand analytics? Are you struggling to combine data from multiple Google Sheets into a single source of truth for reports and analytics? If that’s the case, then your business may be ready for a move to a mature data platform like Snowflake. This post covers two approaches for migrating your data from Google Sheets to Snowflake.
Introduction to Google Sheets
Spreadsheets are electronic worksheets that contain rows and columns which users can input, manage and carry out mathematical operations on their data. It gives users the unique ability to create tables, charts, and graphs to perform analysis.
Google Sheets is a spreadsheet program that is offered by Google as a part of their Google Docs Editor suite. This suite also includes Google Drawings, Google Slides, Google Forms, Google Docs, Google Keep, and Google Sites.
Google Sheets gives you the option to choose from a vast variety of schedules, budgets, and other pre-made spreadsheets that are designed to make your work that much better and your life easier.
Here are a few key features of Google Sheets:
- Many businesses use Google Sheets to capture interaction data at their various web properties including landing pages, contact forms, feedback forms, and surveys.
- During the experimental stage of web development projects, some businesses may use Google Sheets as a database due to its familiar Excel-like interface and the fact that data in Google Sheets can be downloaded as Excel or CSV files for use in report generation or preliminary analytics.
- As the size of the data in the Google sheet grows, businesses may increasingly find it difficult to perform on-demand analytics, as the manual ETL (Extract-Transform-Load) process may take too much time or be altogether infeasible – necessitating migration of data captured in multiple Google sheets to a Data Warehouse for faster, integrated analytics.
Introduction to Snowflake
Snowflake is a fully managed service that provides customers with near-infinite scalability of concurrent workloads to easily integrate, load, analyze, and securely share their data. Its common applications include data lakes, data engineering, data application development, data science, and secure consumption of shared data.
Snowflake’s unique architecture natively integrates computing and storage. This architecture enables you to virtually enable your users and data workloads to access a single copy of your data without any detrimental effect on performance. With Snowflake, you can seamlessly run your data solution across multiple regions and Clouds for a consistent experience. Snowflake makes it possible by abstracting the complexity of underlying Cloud infrastructures.
Here are a few features of Snowflake as a Software as a Service (SaaS) offering:
- Snowflake is a Cloud Data Warehouse operating on a SaaS (software-as-a-service) model.
- Snowflake’s infrastructure is built on AWS (Amazon Web Services) and it offers an architecture that is based on a new SQL database engine that enables faster, more dynamic queries with performant data integration.
- Snowflake’s analytics layer integrates with modern tools and services that facilitate and optimize Data Science exploration.
Understanding the Methods to Connect Google Sheets to Snowflake
These are the methods you can use to connect Google Sheets to Snowflake in a seamless fashion:
Download the Cheatsheet on How to Set Up ETL to Snowflake
Learn the best practices and considerations for setting up high-performance ETL to Snowflake
Method 1: Using Migration Scripts to Connect Google Sheets to Snowflake
To migrate your data from Google Sheets to Snowflake, you may opt for a custom-built data migration script to get the job done.
We will demonstrate this process in the next paragraphs. To proceed, you will need the following requirements.
Prerequisites
- A data migration code project. I have prepared a sample migration project for this demonstration.
- A Linux server for running the migrations on a schedule using Cron Jobs.
- Google Sheets API access is enabled for the Google Sheets you want to migrate.
- A Snowflake account configured with the databases and tables you will be migrating to.
Step 1: Setting Up Google Sheets API Access for Google Sheets
As a first step, you would need to set up Google Sheets API access for the affected Google Sheets. Start by doing the following:
1. Log in to the Google account that owns the Google Sheets
2. Point your browser to the Google Developer Console (copy and paste the following in your browser: console.developers.google.com)
3. After the console loads create a project by clicking the “Projects” dropdown and then clicking “New Project”
4. Give your project a name and click “Create”
5. After that, click “Enable APIs and Services”
6. Search for “Google Sheets API” in the search bar that appears and select it
7. Click “Enable” to enable the Google Sheets API
8. Click on the “Credentials” option on the left navbar in the view that appears, then click “Create Credentials”, and finally select “Service Account”
Click on Credentials
9. Provide a name for your service account. You will notice it generates an email format for the Service Account ID. In my example in the screenshot below, it is “gsheets-migration@migrate-268012.iam.gserviceaccount.com”. Take note of this value. The token “migrate-268012” is the name of the project I created while “gsheets-migration” is the name of my service account. In your case, these would be your own supplied values.
10. Click “Create” and fill out the remaining optional fields. Then click “Continue”
Create Service Account
11. In the view that appears, click “Create Key”, select the “JSON” option and click “Create” to download your key file (credentials). Please store it in a safe place. We will use this later when setting up our migration environment.
12. Finally, click “Done”.
At this point, all that remains for the Google Sheets setup is the sharing of all the Google Sheets you wish to migrate with the email-format Service Account ID mentioned in step 9 above.
Note: You can copy your Service Account ID from the “client-email” field of the credential file you downloaded.
Share Google Sheets
For this demonstration, I will be migrating a sheet called “data-uplink-logs” shown in the screenshot below. I will now share it with my Service Account ID:
Click “Share” on the Google sheet, paste in your Service Account ID, and click “Send”. Repeat this process for all sheets you want to migrate. Ignore any “mail delivery subsystem failure” notifications you receive while sharing the sheets, as your Service Account ID is not designed to operate as a normal email address.
Step 2: Configuring Target Database in Snowflake
We’re now ready to get started on the Snowflake side of the configuration process, which is simpler.
To begin, create a Snowflake account. Creating an account furnishes you with all the credentials you will need to access Snowflake from your migration script.
Specifically:
- After creating your account, you will be redirected to your Cloud Console which will open up in your browser
- During the account creation process, you would have specified your chosen username and password. You would have also selected your preferred AWS region, which will be part of your account.
- Your Snowflake account is of the form <Your Account ID>.<AWS Region> and your Snowflake cloud console URL will be of the form https://<Your Account ID>.<AWS Region>.snowflakecomputing.com/
- Prepare and store a JSON file with these credentials. It will have the following layout:
{ "user": "<Your Username>", "account": "<Your Account ID>.<AWS Region>", "password": "<Your Password>" }
After storing the JSON file, take some time to create your target environment on Snowflake using the intuitive User Interface.
You are initially assigned a Data Warehouse called COMPUTE_WH so you can go ahead and create a Database and tables in it.
After providing a valid name for your database and clicking “Finish”, click the “Grant Privileges” button which will show the form in the screenshot below.
Select the “Modify” privilege and assign it to your schema name (which is “PUBLIC” by default). Click “Grant”. Click “Cancel” if necessary, after that, to return the main view.
The next step is to add a table to your newly created database. You do this by clicking the database name on the left display and then clicking on the “Create Table” button. This will pop up the form below for you to design your table:
After designing your table, click “Finish” and then click on your table name to verify that your table was created as desired:
Finally, open up a Worksheet pane, which will allow you to run queries on your table. Do this by clicking on the “Worksheets” icon, and then clicking on the “+” tab.
You can now select your database from the left pane to start running queries.
We will run queries from this view to verify that our data migration process is correctly writing our data from the Google sheet to this table.
We are now ready to move on to the next step.
Step 3: Preparing a Migration Environment on Linux Server
In this step, we will configure a migration environment on our Linux server.
- SSH into your Linux instance. I am using a remote AWS EC2 instance running Ubuntu, so my SSH command is of the form
ssh -i <keyfile>.pem ubuntu@<server_public_IP>
- Once in your instance, run sudo apt-get update to update the environment
- Next, create a folder for the migration project and enter it
sudo mkdir migration-test; cd migration-test
- It’s now time to clone the migration script we created for this post:
sudo git clone https://github.com/cmdimkpa/Google-Sheets-to-Snowflake-Data-Migration.git
- Enter the project directory and view contents with the command:
cd Google-Sheets-to-Snowflake-Data-Migration; ls
This reveals the following files:
Enter Project Directory
- googlesheets.json: copy your saved Google Sheets API credentials into this file.
- snowflake.json: likewise, copy your saved Snowflake credentials into this file.
- migrate.py: this is the migration script.
Using the Migration Script
Before using the migration script (a Python script), we must ensure the required libraries for both Google Sheets and Snowflake are available in the migration environment. Python itself should already be installed – this is usually the case for Linux servers, but check and ensure it is installed before proceeding.
To install the required packages, run the following commands:
sudo apt-get install -y libssl-dev libffi-dev
pip install --upgrade snowflake-connector-python
pip install gspread oauth2client PyOpenSSL
At this point, we are ready to run the migration script.
The required command is of the form:
sudo python migrate.py <Source Google Sheet Name>
<Comma-separated list of columns in the Google Sheet to Copy>
<Number of rows to copy each run> <Snowflake target Data Warehouse>
<Snowflake target Database> <Snowflake target Table> <Snowflake target table Schema>
<Comma-separated list of Snowflake target table fields> <Snowflake account role>
For our example process, the command becomes:
sudo python migrate.py data-uplink-logs A,B,C,D 24
COMPUTE_WH TEST_GSHEETS_MIGRATION GSHEETS_MIGRATION PUBLIC CLIENT_ID,NETWORK_TYPE,BYTES,UNIX_TIMESTAMP SYSADMIN
To migrate 24 rows of incremental data (each run) from our test Google Sheet data-uplink-logs to our target Snowflake environment, we simply run the command above. The following is a screenshot of what follows:
The reason we migrate only 24 rows at a time is to beat the rate limit for the free tier of the Google Sheets API. Depending on your plan, you may not have this restriction.
Step 4: Testing the Migration Process
To test that the migration ran successfully, we simply go to our Snowflake Worksheet which we opened earlier, and run the following SQL query:
SELECT * FROM TEST_GSHEETS_MIGRATION.PUBLIC.GSHEETS_MIGRATION
Indeed, the data is there. So the data migration effort was successful.
Step 5: Run CRON Jobs
As a final step, run cron jobs as required to have the migrations occur on a schedule. We cannot cover the creation of cron jobs here, as it is beyond the scope of this post.
This concludes the first approach! I hope you were as excited reading that as I was, writing it. It’s been an interesting journey, now let’s review the drawbacks of this approach.
Limitations of using Migration Scripts to Connect Google Sheets to Snowflake
The migration script approach to connect google sheets to Snowflake works well, but has the following drawbacks:
- This approach would need to pull out a few engineers to set up and test this infrastructure. Once built, you would also need to have a dedicated engineering team that can constantly monitor the infra and provide immediate support if and when something breaks.
- Aside from the setup process which can be intricate depending on experience, this approach creates new requirements such as:
- The need to monitor the logs and ensure the uptime of the migration processes.
- Fine-tuning of the cron jobs to ensure optimal data transmission with respect to the data inflow rates of the different Google sheets, any Google Sheet API rate limits, and the latency requirements of the reporting or analytics processes running on Snowflake or elsewhere.
Method 2: Using Hevo Data to Connect Google Sheets to Snowflake
Hevo Data Logo
Hevo is an automated data pipeline that provides an easy-to-use,cost-free User Interface with the ability to copy data without writing any code. Hevo enables the lowest time to production for such copy operations, allowing developers to focus on their core business logic rather than waste time on the configuration nightmares involved in setting these up.
Sign up here for a 14-Day Free Trial!
Hevo provides an easy-to-use data integration platform that works by building an automated pipeline in just two interactive steps:
- Step 1: Configure Google Sheets as a source, by entering the Pipeline Name and the spreadsheet you wish to replicate.
Configure Google Sheets Source
- Step 2: Complete Google Sheets to Snowflake migration by providing your destination name, account name, region of your account, database username and password, database and schema name, and the Data Warehouse name.
Configure Snowflake Destination
And Hevo automatically takes care of the rest. It’s just that simple. You are now ready to start migrating data from Google Sheets to Snowflake in a hassle-free manner! You can also integrate data from numerous other free data sources like Google Sheets, Zendesk, etc. to the desired destination of your choice such as Snowflake in a jiff.
You can explore more about Hevo here.
Hevo is also much faster, thanks to its highly optimized features and architecture. Some of the additional features you can also enjoy with Hevo are:
- Hassle-free Setup and Implementation – Your data integration project can be set up in just a few minutes with Hevo. After that, everything runs automatically.
- Transformations – Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag-and-drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
- Connectors – Hevo supports 150+ integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; MySQL, MongoDB, TokuDB, DynamoDB, and PostgreSQL databases to name a few.
- Full Monitoring and Data Management – Hevo automatically manages your data loads and ensures you always have up-to-date and accurate data in Snowflake.
- Automatic Change Data Capture – Hevo performs incremental data loads automatically through a number of in-built Change Data Capture mechanisms. This means, as and when data on Google Sheets changes, they are loaded onto Snowflake in real time.
- Hundreds of Additional Data Sources – In addition to Google Sheets, Hevo can bring data from a bunch of other Free and Paid data sources into Snowflake in real time. This will ensure that you always have a stable data integration partner for your growing data needs.
- Round-the-clock (24×7) Support – Hevo has a dedicated support team available at all points to swiftly resolve any queries and unblock your data integration project.
It just took us 2 weeks to completely transform from spreadsheets to a modern data stack. Thanks to Hevo that helped us make this transition so smooth and quick. Now all the stakeholders of our management, sales, and marketing team can easily build and access their reports in just a few clicks.
– Matthew Larner, Managing Director, ClickSend
Conclusion
This blog talks about the two different methods you can use to connect Google Sheets and Snowflake in a seamless fashion: using migration scripts and with the help of a third-party tool, Hevo.
Visit our Website to Explore Hevo
Extracting complex data from a diverse set of data sources can be a challenging task and this is where Hevo saves the day! Hevo offers a faster way to move data from Databases or SaaS applications such as MongoDB into your Data Warehouse like Snowflake to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.
As we have seen, Hevo greatly simplifies the process of migrating data from your Google Sheets to Snowflake or indeed any other source and destination. Sign Up for your 14-day free trial and experience stress-free data migration today! You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.