Google Analytics to Redshift: Steps to Replicate Data

on Tutorial • June 6th, 2019 • Write for Hevo

Many businesses worldwide use Google Analytics to collect valuable data on website traffic, signups, purchases, customer behaviour, and more. Given the humongous amount of data that is present on Google Analytics, the need to deeply analyse it has also become acute. Naturally, organizations are turning towards Amazon Redshift, one of the widely adopted Data Warehouses of today, to host this data and power the analysis. In this post, you will learn how to move data from Google Analytics to Redshift.

Methods to move data from Google Analytics to Redshift

There are two ways of loading your data from Google Analytics to Redshift

Method 1: Using a fully-managed data pipeline platform such as Hevo Data

Google Analytics comes pre-built “out of the box” integration in Hevo. You can easily move data with minimal setup, configuration from your end. Given Hevo is a fully managed platform, no coding help or engineering bandwidth would be needed. Hevo will ensure that your data in the warehouse, ready for analysis in a matter of just a few minutes.

Method 2: Hand coding the data replication between Google Analytics and Redshift

The activities of extracting data from Google Analytics, transforming that data to a usable form, and loading said data onto the target Redshift database would have to be carried out by custom scripts. The scripts would have to be written by members of your data management or business intelligence team. This data pipeline would then have to be managed and maintained over time.

Let’s take a look at how each of those options would work.

Method 1: Moving Data from Google Analytics to Amazon Redshift Using Hevo

 Using Hevo Data Integration Platform, you can seamlessly replicate data from Google Analytics to Redshift with 2 simple steps:
  1. Configure the data source by authenticating Google Analytics:

  2. Configure the Redshift warehouse where you want to move your Google Analytics data:

    Configuring Redshift Destination on Hevo

Hevo takes up all the grind work ensuring that consistent and reliable data is available for analysis in your Redshift Data Warehouse.

Method 2: Moving Data from Google Analytics to Redshift using Custom Scripts

Pre-Migration Steps

  1. Audit of Source Data
    Before data migration begins, Google Analytics event samples should be reviewed to ensure that the engineering team is completely aware of the schema. Business teams should coordinate with engineering to clearly define the data that needs to be made available. This will reduce the possibility of errors due to expectation mismatch between business and engineering teams
  2. Backup of all Data
    In the case of a failed replication, it is necessary to ensure that all your GA data may be retrieved with zero (or minimal) data loss. Also, plans should be made to ensure that sensitive data is protected at all stages of the migration.

Manual Migration Steps

Google Analytics provides an API, the Google Core Reporting API, that allows engineers to pull data. As such, most of the data that is returned is combined into a consolidated JSON format, which is incompatible with Redshift.
The scripts would need to pull data from GA to a separate object, such as a CSV file. Meanwhile, to prepare the Redshift data warehouse, SQL commands must be run to create the necessary tables that define the database structure.
The aforementioned CSV file must then be loaded to a resource that Redshift can access. Amazon S3 cloud storage service is a good option. There is some amount of preparation involved in configuring S3 for this purpose. The CSV file must then be loaded into the S3 that you configured. The COPY command must be invoked to load the data from the CSV file and into the Redshift database.
Once the transfer is complete queries should be run on the newly populated database to test if the data is accurate and complete. This would re-ensure that the data load was successful.
Having been verified, a cron job should be set up to run with reasonable frequency, ensuring that the Redshift database stays up to date.
Say you have different Google Analytics views set up for Website, App, etc. You would have to end up repeating the above process for each of these.

Potential Issues with Migrating Data from Google Analytics to Redshift Using Custom Scripts

Manual coding for data replication between diverse technologies, while not impossible, does come with its fair share of challenges. Immediate consideration is one of time and cost. While the value of the information to be gleaned from the data is definitely worth the cost of implementation, it is still a considerable cost.
The second concern to arise is of accuracy and effectiveness. How good is the code? How many iterations will it take to get it right? Have effective tests been developed to ensure the accuracy of the migrated data? Have effective process management policies been put in place to ensure correctness and consistency? For instance, how would you identify if GA Reporting API JSON format has been altered? The questions never end.
Should the data load process be mismanaged, serious knock-on effects may result. These may include issues such as inaccurate data being loaded in the form of redundancies and unknowns, missed deadlines, and exceeded budgets as a result of multiple tests and script rewrites and more. 
However, loading data from Google Analytics to Redshift may also be handled by much easily in a hassle-free manner with platforms such as Hevo.

Advantages of using Hevo

The relative simplicity of using Hevo as a data integration platform, coupled with its accuracy takes the difficulty out of your data analysis projects. Here are the advantages:
  1. Low time to implementation – You can connect to Google Analytics in minutes and move data to Redshift in real-time
  2. Fully Managed – No code, hassle-free data replication
  3. Complete Data – Hevo’s unique architecture ensures that the data is ingested to the warehouse without any data loss
  4. Alerts and Notification – Whenever there are unresolved errors, you will be notified over either slack or email in real time
  5. Scalability – Hevo can scale as your business grows. You can move data from more data sources ( than just Google Analytics whenever needed. Additionally, Hevo is built to handle the increasing amount of data your business would generate
  6. Automatic schema detection, mapping, and evolution – Hevo detects Google Analytics schema it receives data for replication. When the sources schema changes in Google Analytics, Hevo makes changes on Amazon Redshift, thereby ensuring that the data is moved reliably
  7. Exceptional Support: Hevo will always have your back by providing 24×7 priority support over Slack and email
Data and insights are the keys to success in business, and good insights can only come from correct, accurate and relevant data. Hevo, a 100% fault tolerant, easy to use Data Pipeline Platform ensures that your valuable data is moved from Google Analytics to Redshift with care and precision. With Hevo, you can additionally connect to 100s of other data sources such as SaaS applications, Cloud Storage, Databases and more ( thereby, future-proofing your analytics infrastructure.
Sign up for a 14-day free trial here and experience a seamless data replication experience from Google Analytics to Redshift.

No-code Data Pipeline for Redshift