Google Sheets is a free web-based spreadsheet program provided by Google as a part of its Google Apps suite. It allows multiple users to create, edit and collaborate on spreadsheets in real-time. Google Sheets is a free and fully functional spreadsheet program that is compatible with most popular spreadsheet formats. As a cloud-based SaaS utility, your files are accessible from anywhere via computer and/or mobile devices.
Redshift is a data warehouse service provided by Amazon. It’s a part of the Amazon Web Services (AWS suite of services. It was built to be able to scale for and handle large datasets that companies generate in their day to day operations.
Built on an SQL framework, Redshift facilitates analysis of the data it contains, allowing for complex queries, deep analysis, and powerful insights. It allows for the integration of a number of different connecting applications making it even more powerful.
There are several ways that data can be migrated from Google Sheets to Redshift. In this article, we’ll look at two of those options:
Approaches to move data from Google Sheets to Redshift
There are two popular methods to load Google Sheets data to Snowflake Data Warehouse.
Method 1: Manually using CSV files and Amazon S3
This method can be done by most people with a basic understanding of computers and a little know-how regarding using command-line instructions, but it will take a bit of time. More so if you are using a large number of Google spreadsheets or if the spreadsheets themselves are large and are made up of a lot of data. CSV files are very useful as they may be used in a variety of scenarios and situations.
Method 2: Using an Automated Data Pipeline Platform Hevo Data
Connecting Google Sheets is a breeze with Hevo. It is both cost and time-efficient as it is built for ease of use. Since Hevo is fully managed, you just need to set up the data pipeline once and then never revisit. This allows you to focus on your analyzing data instead of wrangling with it.
Let’s delve into each option.
Method 1: Migrating from Google Sheets to Redshift using CSV Files and Amazon S3
Before proceeding, the following elements are required:
- An Amazon S3 account
- A pre-created table in Redshift (duplicating the original sheet’s structure would help).
Steps in Migrating Data
Manually migrating data from Google Sheets to Amazon Redshift is done in three phases
- Extracting the data from Google Sheets as a CSV file.
- Uploading the CSV file to S3.
- Copying the data from S3 to Redshift
We will look at each of these steps in more detail.
- Download the CSV file
- Open the desired spreadsheet in Google Sheets
- Select File > Download as > Comma-separated values (.csv)
Note: When downloading CSV files it must be remembered that you will only be downloading the CSV file for the currently selected sheet. If the desired spreadsheet has multiple worksheets then a CSV file must be downloaded for each of these.
- The file will be downloaded via your browser to your download folder (or whichever folder you choose). Be sure to give the file an appropriate name.
- Repeat for all of the desired spreadsheets and worksheets within them.
- Upload the CSV files to an Amazon S3 Bucket
- Create a bucket in S3
- Sign in to the AWS Management Console
- Open the Amazon S3 console at https://console.aws.amazon.com/s3/
- Click Create Bucket
- Enter an appropriate name in the Bucket Name box.
- Select a region.
- Click Create.
- Create a folder in the new bucket
- Click the name of the newly created bucket.
- Select Actions > Create Folder
- Name the new folder load
Note: Amazon will charge a nominal amount for the time that objects are stored in the bucket. For more info on Amazon S3 pricing see here.
- Upload the files to the newly created folder
- Click the name of the new folder.
- Click Upload
This Upload – Select Files Wizard will be presented,
- Click Add Files
A file selection dialog box will be presented.
- Select all of the CSV files that you have downloaded from Google Sheets.
- Click Start Upload.
- Create a bucket in S3
Note: For ease of use, it’s suggested that you should use the same credentials to create your Amazon S3 bucket as you do to create your Amazon Redshift tables.
- Copy the Data from S3 to Redshift
Use the COPY command to load the data from the files uploaded to S3 into the Redshift database.
COPY table_name FROM 's3://<your-bucket-name>/load/file_name.csv' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' CSV;
If the CSV file has a different column arrangement from the Redshift table, or it does not contain all of the columns, then you will have to specify a column list.
COPY table_name col1, col2, col3, col4) FROM 's3://<your-bucket-name>/load/file_name.csv' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' CSV;
If you wish to ignore the file header rows in the CSV files then you may also specify that.
COPY table_name col1, col2, col3, col4) FROM 's3://<your-bucket-name>/load/file_name.csv' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' CSV IGNOREHEADER1;
Your data should now be accessible and queriable in your Amazon Redshift database.
Google Sheets to Redshift: Potential Issues with Using CSV Files and Amazon S3 for Data Migration
- The biggest concern with this method is time. Your time. Transferring data from Google Sheets to Redshift via this method takes up a lot of time and effort. The task becomes exponentially more difficult the more spreadsheets and worksheets that you have. Plus, given the manual nature of the exercise, there are multiple opportunities for things to go wrong.
- This solution does not ensure you have data available in real-time. If your use case needs you to bring in data in real-time, then you would need to write additional code to achieve this.
Method 2: Migrating Data from Google Sheets to Redshift
Hevo provides an easy-to-use, automated data pipeline platform that can move data from Google Sheets to Redshift in just two interactive steps:
- Authenticate and Connect Google Sheets as a source. Select the sheets that you want to load to Redshift
- Select the Redshift destination where you want to move the data.
Once completed Hevo will immediately begin migrating data from Google Sheets to your Redshift table, allowing you access to your data in real-time. Hevo ensures that your data is loaded in a secure, consistent and reliable fashion. Sign up for a risk-free, 14-day free trial to explore Hevo.
More Reasons to Try Hevo
The ease of using Hevo as a data migration tool, along with its reliability and robust architecture, makes data migration projects a snap. Hevo also offers additional benefits such as:
- Low Time to Implementation – Can connect to the Google sheets data source and move data to Redshift in minutes.
- Fully Managed – Hassle-free migration. Once set up, no manual intervention is needed. Hevo takes care of all maintenance and monitoring activities.
- Alerts and Notification – If there are any errors in incoming data or the pipeline configuration, you will be notified by either Slack or email. This allows you to take timely action.
- Complete Data – Hevo is built on a fault-tolerant architecture that ensures zero data loss.
- Scalability – Can add more sources whenever needed. Check out the other data sources Hevo integrates with here. Hevo can handle data of any size.
While Google Sheets does provide effective cloud storage for personal files, the capabilities for more advanced data analysis and SQL querying make the transfer to Amazon Redshift a good move. Hevo makes that easy for you and ensures your data is securely loaded. To find out how Hevo can be integrated into your business’ data sign up for a free trial here.