Google Analytics to SQL Server: Move Data Instantly

on Data Integration • June 7th, 2020 • Write for Hevo

Google Analytics helps organizations track user activity and the effectiveness of their content. It is widely used by marketers and advertisers around the world to gain critical insights from their marketing campaigns. The data obtained from Google Analytics is further transferred to SQL server to successfully backup significant business data to be used for further analysis. This blog post aims at introducing you to two approaches that can be used for data transfer from Google Analytics to SQL Server.

The constituents of this post are listed below. The navigation links below will help you quickly traverse to the point of your interest:

  1. Introduction to Google Analytics
  2. Introduction to SQL Server
  3. Prerequisites to Data Transfer
  4. Method 1: Using Custom Scripts to Load Data
  5. Challenges involved in using Custom Scripts
  6. Method 2: Using a No-code Data Pipeline to Load Data

Moving Data from Google Analytics to SQL Server

There are various methods for transferring data from Google Analytics to SQL Server. In this post, you will peruse through the following approaches.

Method 1: Using Custom Scripts to Move Data
To write custom scripts to transfer data from Google Analytics to SQL Server, you would have to first generate account keys, install required python libraries and finally build a Python script to load data from Google Analytics to SQL Server. Using custom scripts can be a time-consuming process and needs continuous monitoring of the written scripts.

Method 2: Using a No-code Data Pipeline, Hevo
Using Hevo, you can seamlessly move data from Google Analytics to SQL Server without writing a single piece of code. Since Hevo is fully managed by a team of experts, you would not need to monitor scripts regularly, leaving you with enough time to focus on critical business activities. Sign up for a 14-day free trial today!

Introduction to Google Analytics

Google Analytics helps companies keep track of user details and advertising campaigns using a simple javascript plugin. It provides an excellent reporting dashboard and the ability to run various analyses on the user activity data. It even supports advanced machine learning capabilities and helps organizations to deploy this capability to make better decisions without having to know the concepts or implementation details of machine learning algorithms. Google Analytics also makes raw user activity data available to the companies that want to run custom algorithms and reports. This requires the raw clickstream data to be loaded into the organization’s own data store. 

Introduction to SQL Server

SQL Server is a relational database system provided by Microsoft. It can hold petabytes of data and have 100s of parallel running cores. It has a comprehensive querying layer that aids in business analysis and tightly integrates with reporting tools like Power BI. Recent versions have spark support built-in; which means analyzing large amounts of data using Spark SQL without much additional setup needed. SQL Server provides enterprise-grade security and support which makes it a favourite among organizations with strict compliance requirements. This post is about loading data from Google Analytics to SQL Server.

Prerequisites

The following checklist will help you stay prepared for the data transfer:

  • A valid Google Analytics account with admin privileges. Create an account here.
  • Basic knowledge of Python and how to use Python libraries.
  • An SQL Server installation with write permission. Find the SQL server installation guide here and click on this link to learn how you can configure account permissions.

Method 1: Using Custom Scripts to Load Data

This section of the post will help you build a custom script to migrate data from Google Analytics to SQL server.

Google Analytics provides data through Reporting API V4. These APIs provide aggregate analytics information on the basics of metrics and dimensions. Dimensions are terms based on which metrics are aggregated. The number of users is a typical example of metrics. The time window is one of the dimensions on which user count is aggregated. The API uses the OAuth authentication protocol and requires a service account to access the data.

Below are the steps involved in developing custom scripts to load data from Google Analytics to SQL Server.

Step 1: Generating the Account Keys on Google Analytics
Step 2: Installing the Required Python Libraries
Step 3: Building a Python Script
Step 4: Loading the Data from Google Analytics to SQL Server

Step 1: Generating the Account Keys on Google Analytics

Go to the Google API console and create a project. In the ‘credential’ section of the project click ‘Create Credentials’. Select ‘Service Accounts’ and click ‘Create Key’. Select JSON and save the key on your computer. We will be using this key in our next steps.

Google Analytics to SQL server: Account creation
Google Analytics to SQL server: Account creation

Step 2: Installing the Required Python Libraries

Google Analytics provides tools in Python, Java, Javascript, etc to access their APIs. Using command-line utilities like CURL is a tedious process because of the complexity of the JSON structure and the authentication process. 

Use the following command to install the Python GA library to your environment.

sudo pip install --upgrade google-API-python-client

The above command needs a Python environment to be pre-installed. Let us now proceed to build the script.

Step 3: Building a Python Script

This section deals with building a python script using the installed Google API client library. We will start by importing the required libraries. The various stages involved in this stage are as follows:

  1. Import the Required Libraries
  2. Initialize the Keys and Configuration Parameters
  3. Request the Reports JSON from the API
  4. Parse the JSON to Extract the Relevant Fields

1. Import the Required Libraries

Use the following command to import the Python libraries.

 from apiclient.discovery import build
      from oauth2client.service_account import ServiceAccountCredentials

2. Initialize the Keys and Configuration Parameters


      SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
                        KEY_FILE_LOCATION = '<JSON_FILE_WITH_KEY>'
                        VIEW_ID = '<VIEW_ID>'

 credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)

  # Build the service object.
  analytics = build('analyticsreporting', 'v4', credentials=credentials)

The JSON file is the one that was saved in the first step. The VIEW_ID can be found from the admin section. Go to the Admin section, under the ‘views’ column, click view settings. Under basic settings, you will find the VIEW_ID.

3. Request the Reports JSON from the API


response = analytics.reports().batchGet(
      body={
        'reportRequests': [
        {
          'viewId': VIEW_ID,
          'dateRanges': [{'startDate': '7daysAgo', 'endDate': 'today'}],
          'metrics': [{'expression': 'ga:sessions'}],
          'dimensions': [{'name': 'ga:country'}]
        }]
      }
  ).execute()

The above code snippet accesses the reports for a specific data range. The metric we access here is the session count and dimension is ‘country’.

4. Parse the JSON to Extract the Relevant Fields

import pandas as pd from pandas.io.json 
import json_normalize

ga_reports = response['reports'][0]
columnHeader = ga_reports['columnHeader']['dimensions'] 
metricHeader = ga_reports['columnHeader']['metricHeader']['metricHeaderEntries'] report_columns = columnHeader for metric in metricHeader:  
   report_columns.append(metric['name']) 
report_data = json_normalize(ga_reports['data']['rows']) 
g_dimensions = pd.DataFrame(report_data['dimensions'].tolist()) 
g_metrics = pd.DataFrame(report_data['metrics'].tolist()) 
g_metrics = data_metrics.applymap(lambda x: x['values']) 
g_metrics = pd.DataFrame(data_metrics[0].tolist()) 
result = pd.concat([g_dimensions, g_metrics], axis=1, ignore_index=True)
result.to_csv('reports.csv')

The above code snippet uses a pandas data frame to process the JSON and save it as a CSV with metrics and dimensions. Execute the script to generate the CSV file. The resulting CSV will have the below columns:

id, ga: country, ga: sessions
123x, India, 23
341a, Germany, 34

Step 4: Loading the Data from Google Analytics to SQL Server

 You will now load the above CSV to SQL Server using the below command.

BULK INSERT GaSessions
FROM 'C:CSVDatareports.csv' WITH 
( FIRSTROW = 2,
 FIELDTERMINATOR = ',',
 ROWTERMINATOR = 'n',
 TABLOCK )

That concludes the steps to load data from Google Analytics to SQL Server. The task required us to build a custom script, generate a CSV using it, and then load the CSV to SQL Server. In reality, there are a lot more challenges in executing such a load process. The below section will give you a good idea about the limitations of this approach.

Challenges involved in using Custom Scripts

  1. The first and foremost one is that this method requires you to write a lot of custom code. If you have a transformation requirement before loading to SQL Server, be prepared to spend even more time writing custom code.
  2. This approach is suitable for a one-time bulk insert. In most cases you will need to execute a continuous load process to ensure that SQL Server always has the latest data. In that case, you will need logic to handle the deletes and duplicates.
  3. The above approach only deals with one API. Google Analytics provides a large amount of data in a variety of APIs, each of which follows different JSON structures. The developer will need to be familiar with all these and write custom code to accommodate them.
  4. Google APIs are rate limited. Hence the logic has to work around the thresholding and extract data without triggering the rate violation exceptions.

An alternative to spending all this effort in building a custom script is to use a cloud-based ETL tool like Hevo which can handle such load jobs in a matter of few clicks. 

Method 2: Loading Data using Hevo

Hevo, a No-code Data Pipeline, provides an easy to use, completely managed cloud-based service that can execute the load operation from Google Analytics to SQL server in a few clicks. Hevo also helps you transform your data and automatically map schemas in databases. The data migration from Google Analytics to SQL server can be carried out in 2 simple steps:

Step 1: Connect Hevo to Google Analytics to set it up as your source.

Step 2: Configure SQL server as a destination and start moving data instantly.

Key Highlights of Hevo

  • Easy-to-use Platform: Hevo has a straightforward and intuitive UI to configure the jobs.
  • Data Transformation: Hevo supports complex transformations for the data on the move.
  • Real-time Data Transfer: Support for real-time synchronization across a variety of sources and destinations. 
  • Fully Managed: Completely managed service which spares the users of all infrastructure and maintenance activities. 
  • Automatic Schema Mapping: Hevo can automatically detect your source’s schema type and match it with the schema type of your destination. 

Would you be willing to give Hevo a try? Get started by signing up for a free 14-day trial today!

How do you load data from Google Analytics to SQL server? Share your thoughts in the comments section below.

No-code Data Pipeline For SQL Server