How To Export Google Analytics Data: A Detailed Guide

on Tutorials • June 23rd, 2020 • Write for Hevo

Introduction

Google Analytics enables organizations to get complete details of all the user interactions on their websites or mobile applications. The ease of integration to products and the comprehensive reporting dashboard makes Google Analytics the most popular choice for deriving customer analytics from internet-based products. Decision-makers can rely on data from Google Analytics to choose marketing alternatives and decide what to focus on, in their customer acquisition journey. It also provides a simple interface to set up machine learning algorithms on user data. Even though the reporting dashboard is comprehensive, it is sometimes necessary for organizations to get access to raw hit level or view level data from their websites to perform deeper analysis.

Export Google Analytics data: GA "Home" page
Image by Ajay Nainani via Google

Google allows us to export Google Analytics data through its paid offering called GA360. The cost of this offering makes it inaccessible for smaller organizations. The goal of this article is to explain how to export Google Analytics raw data. More specifically, you will cover how to get raw hit level data from Google Analytics without having access to GA360.

Here is an outline of what you will cover in this article:

A Simpler Alternative To Replicate Google Analytics Data To Your Data Warehouse

Hevo Data, a No-code Data Pipeline, is a fully automated solution that can be used to extract, transform, and load Google Analytics data into a data warehouse with just a few clicks. Hevo provides an intuitive user interface and it supports robust data transformations. Check out some of Hevo’s cool features:

  • Fully Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Real-time Data Transfer: Hevo provides real-time data migration. So you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100’s of sources that can help you scale your data infrastructure as required.
  • Live Support: The Hevo team is available round the clock to extend its support to your team through chat, email, and support call.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.

Get started with Hevo by signing up for a free 14-day trial!

Prerequisites

  • Google Analytics account with admin privileges.
  • Basic knowledge of dimensions and metrics in Google Analytics.

Getting Data From Google Analytics

Google enables programmatic access to GA data through Reporting API V4. The data is structured in terms of dimensions and metrics. Dimensions are factors based on which data is aggregated and metrics are the keys that provide information. For example, ‘country’ is a dimension and the number of sessions is a metric.

Google Analytics provides a set of dimensions and metrics as default. Users can also create their own dimensions and metrics by making small modifications to the tracking code that is deployed with the website. To get hold of raw hit level data from Google Analytics, you need to use some custom dimensions.

Here are the steps you will cover in this section:

Creating Custom Dimensions In Google Analytics

Go to the admin section and navigate to the property where the custom dimension is to be added. Click on ‘Create New Custom Dimension’ and add the dimension name. For now, let us define it as hit_id. Select the scope as ‘hit’, check the active box, and click on ‘Create’.

Adding The Custom Dimension Tracking Code To The Google Analytics Script

Google Analytics uses a script called gtag.js to track user behavior on your website. Add the below snippet in your gtag.js part on the web page for which you want to get hit level data:

gtag('config', 'GA_MEASUREMENT_ID', {
  'custom_map': {'dimension1': hit_id, ‘dimension2’:brow_id}
});

In the above code, use an arbitrary identifier as the GA_MEASUREMENT_ID. Google Analytics follows a convention of dimension<number from 1-20> as the key in the custom map. The value is the name of the dimension that will be used here. 

You must now create a unique id for the user’s browser using the below snippet:

if (document.cookie.indexOf('browser_uuid_set=1') == -1) {

gtag(‘set’,{‘brow_id’,Math.random()})

document.cookie = 'browser_uuid_set=1; expires=Fri, 01 Jan 2100 12:00:00 UTC; domain=.arem.us; path=/';

}

gtag(‘set’,{'hit_id',new Date().getTime()})

gtag('event', 'page_view');

Time is being used as an identifier of the hit here. Every time a hit is made, gtag.js will send a page_view event with two details – The browser id which was created, and the timestamp of that view.

In the next step, this raw hit level data that is reaching the GA server will be gotten hold of.

Installing Google Reporting API V4 Python Library

You will now try to access the data using Google Reporting API V4. Install the python library for Google Reporting API using the below command:

sudo pip install --upgrade google-api-python-client

Importing The Required Libraries

Using the python library, create a script to download the hit level data into a CSV. Begin by importing the required libraries. Use the below snippet:

from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials

Initializing The Required Variables

SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = '<KEY_JSON_FILE>'
VIEW_ID = '<REPLACE_WITH_VIEW_ID>'

The above variables are required for OAuth authentication. You have to replace the key file location and the view id with the assets you obtained while setting up the service account for Google Analytics. VIEW_ID can be seen in the admin section.

Initializing The Objects

Initialize the objects for accessing the data using the below snippet:

credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)
 # Build the service object.
  analytics = build('analyticsreporting', 'v4', credentials=credentials)

Retrieving Response From Reporting API

Use the batchGet method in the library to retrieve the response from the Reporting API. 

response = analytics.reports().batchGet(
      body={
        'reportRequests': [
        {
          'viewId': VIEW_ID,
          'dateRanges': [{'startDate': '7daysAgo', 'endDate': 'today'}],
          'dimensions': [{ "name":"ga:dimension1" },
{"name":"ga:dimension2" }],
        }]
      }
  ).execute()

The response will be a JSON file that contains a nested map of the two dimensions that were sent to Google Analytics for each page view. The JSON file can be parsed into CSV or directly loaded into a data warehouse after required transformations.

Conclusion

Great! You have now learned how to export Google Analytics data. You have learned how to add custom dimensions to Google Analytics tracking code and also how to extract hit level data using custom dimensions. Here are some of the typical challenges developers face while implementing the above approach in production:

  • This approach requires you to modify the tracking code as well as write a custom script using the Reporting API python library to download the data. Taking this to production will require you to write even more code. Cloud-based ETL tools like Hevo can accomplish this without the need to write any custom code. 
  • The above approach will work well for one offload, but in data warehousing scenarios, that is rarely the case. You will need to build additional logic to execute this continuously. Hevo’s scheduling capability will handle such problems for you, and hence saving critical development time. Reporting API has quotas and limits associated with it. So the application logic should have the ability to work around the limits. Hevo automatically adheres to such throttling limits and relieves you of this implementation. 
  • The raw data in most cases will have to be transformed into different formats before inserting it into the data warehouse. Hevo comes with comprehensive transformation support and helps you accomplish these in a few clicks.

So if all that scripting magic and the associated challenges feel like too much effort, you can think of using a completely automatic, cloud-based ETL tool like Hevo that can extract, transform and load Google Analytics data to almost any data warehouse in a matter of a few clicks. Hevo provides a simple user interface and comes with complex transformation support. The best part is, you can try it for free! Sign up here for the free 14-day trial.

Share your thoughts on how to export Google Analytics raw data in the comments. We would love to hear from you!

No-code Data Pipeline for Google Analytics