Press "Enter" to skip to content

Google Analytics to PostgreSQL: Steps to Integrate Data in Minutes

Google Analytics is a one-stop solution for website owners who want to have comprehensive information on all user activities happening on a website. Google analytics provides unique insights about users and allows the owners to measure and monitor the performance of their marketing, content, and products. It also enables users to take advantage of the built-in advanced algorithms to make the most of their data. Even though Google provides a comprehensive set of analysis tools to work with the data, most organizations will need to pull in the raw data to their on-premise database. This is because having it in their control allows them to combine with their own customer and product data to perform much deeper analysis. This post is about importing data from google analytics to PostgreSQL – one of the very popular relational databases in the market today.

Two ways to move data from Google Analytics to PostgreSQL

  • Building a Custom ETL Pipeline

    You could invest precious engineering resources to build custom ETL scripts to extract data from Google Analytics API and load to PostgreSQL. Once this infrastructure is set up, you would need to provision tech-bandwidth to maintain and monitor the same so that you have consistent and reliable data in PostgreSQL at any point.

  • Implement a Data Pipeline Platform like Hevo Data that works out of the box

    A ready-to-use Data Pipeline Platform such as Hevo would come with native integrations to both Google Analytics and PostgreSQL. Implementing a platform like Hevo would allow you to bring data from Google Analytics to PostgreSQL reliably in just a few minutes without the need for tech bandwidth. This, in turn, will reduce the project timelines drastically and provide swift access to Google Analytics data.

This blog covers the first approach (building custom ETL scripts) extensively. Additionally, the article also highlights the challenges and limitations of this approach. Towards the end, the post also discusses simpler ways of achieving the same outcome.

Google Analytics to PostgreSQL: Building a Custom ETL PipelineGoogle Analytics to PostgreSQL

Getting data from Google Analytics

Click event data from Google Analytics can be accessed through Reporting API V4. There are two sets of Rest APIs in Reporting API V4 tailor-made for specific use cases.

  1. Metrics API – These APIs allow users to get aggregated analytics information on user behavior on the basis of available dimensions. Dimensions are the attributes based on which metrics are aggregated. For example, if time is a dimension and the number of users in a specific time will be a metric.
  2. User Activity API – This API allows you to access information about the activities of a specific user. Knowledge of the user id is required in this case. To get user ids of people accessing your page, you will need to modify some bits in the client-side google analytics function that you are going to use and capture the client id. This information is not exactly available in the google developer documentation but there is ample documentation on the internet about it. Ensure you consult the laws and restrictions in your local country before attempting this since the legality of this will depend on the privacy laws of the country. After changing the client script, you will also need to register user id as a custom dimension in Google Analytics dashboard.

Google Analytics APIs use oAuth 2.0 as the authentication protocol. Before accessing the APIs, the user first needs to create a service account in the Google Analytics dashboard and generate authentication tokens. Let us review how this can be done. 

  1. Go to google service accounts page and select a project. If you have not already created a project, please create one.
  2. Click on Create Service Account.
  3. You can ignore the permissions for this exercise.
  4. On the ‘Grant users access to this service account’ section, click create key.
  5. Select JSON as the format for your key.
  6. Click create a key and you will be prompted with a dialogue to save the key in your local computer. Save the key.

We will be using the information from this step when we actually access the API.

Accessing Google Reporting API V4

Google provides easy to use libraries in Python, Java, and PHP to access its reporting APIs. These libraries are the preferred method to download the data since the authentication procedure and the complex JSON response format makes it difficult to access these APIs using command-line tools like CURL. Detailed documentation of this API can be found here. Here the python library is used to access the API. The following steps and code snippets explain the procedure to load data from Google Analytics to PostgreSQL.

  1. Install the python GA library to your environment.
    sudo pip install --upgrade google-api-python-client

    Before this step, please ensure the python programming environment is already installed and works fine.

    We will now start writing the script for downloading the data as a CSV file.

  2. Import the required libraries.
    from apiclient.discovery import build
    
    from oauth2client.service_account import ServiceAccountCredentials
  3. Initialize the required variables for OAuth authentication.
    credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)
    
    # Build the service object.
     analytics = build('analyticsreporting', 'v4', credentials=credentials)

    Replace the key file location and view id with what we obtained in the first service creation step. View ids are the views from which you will be collecting the data. To get the view id of a particular view that you have already configured, go to the admin section, click on the view that you need and go to view settings.

  4. Build the required objects
    credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)#Build the service object
    
    analytics = build('analyticsreporting', 'v4', credentials=credentials)
  5. Execute the method to get the data. The below query is for getting the number of users aggregated by country from the last 7 days.
    response = analytics.reports().batchGet(body={
    
    'reportRequests': [
    
    {
    
    'viewId': VIEW_ID,
    
    'dateRanges': [{'startDate': '7daysAgo', 'endDate': 'today'}],
    
    'metrics': [{'expression': 'ga:sessions'}],
    
    'dimensions': [{'name': 'ga:country'}]
    
    }]
    
    }
    
    ).execute()
  6. Parse the JSON and write the contents into a CSV file.
    import pandas as pd from pandas.io.json 
    import json_normalize
    
    reports = response['reports'][0]
    columnHeader = reports['columnHeader']['dimensions'] 
    metricHeader = reports['columnHeader']['metricHeader']['metricHeaderEntries'] columns = columnHeader for metric in metricHeader: 
     columns.append(metric['name']) 
    data = json_normalize(reports['data']['rows']) 
    data_dimensions = pd.DataFrame(data['dimensions'].tolist()) 
    data_metrics = pd.DataFrame(data['metrics'].tolist()) 
    data_metrics = data_metrics.applymap(lambda x: x['values']) 
    data_metrics = pd.DataFrame(data_metrics[0].tolist()) 
    result = pd.concat([data_dimensions, data_metrics], axis=1, ignore_index=True)
    result.to_csv('reports.csv')

    Save the script and execute it. The result will be a CSV file with the following column:

    Id , ga:country, ga:sessions
  7. This file can be directly loaded to a PostgreSQL table using the below command. Please ensure the table is already created
    COPY sessions_tableFROM 'reports.csv' DELIMITER ',' CSV HEADER;
    
    The above command assumes you have already created a table named sessions_table.

You now have your google analytics data in your PostgreSQL table. Now that we know how to do get the Google Analytics data using custom code, let’s look into the limitations of using this method.

Google Analytics to PostgreSQL: Limitations of Building Custom Code

  1. The above method requires you to write a lot of custom code. Google’s output JSON structure is a complex one and you may have to make changes to the above code according to the data you query from the API.
  2. This approach is fine for a one-off data load to MySQL, but in a lot of cases, organizations need to do this periodically and merge the data point every day while handling duplicates. This will force you to write a very complex import tool just for Google Analytics. 
  3. The above method addresses only one API that is available for Google analytics. There are many other available APIs from Google analytics which provide different types of data. An example is a realtime API.  All these APIs come with a different output JSON structure and the developers will need to write separate parsers.
  4. The APIs are rate limited which means the above approach will lead to errors if complex logic is not implemented to throttle the API calls. 

A solution to all the above problems is to use a completely managed ETL solution like Hevo which provides a simple click and execute interface to get all your Google Analytics data to a database or data warehouse of your choice. 

An easier solution to move data from Google Analytics to PostgreSQL

The best way to bypass the above constraints is to use a Data Pipeline Platform like Hevo (14-day free trial) that works out of the box. Hevo can help you import data from Google Analytics to PostgreSQL in two simple steps:

  1. Authenticate and Connect Google Analytics Source
  2. Configure the PostgreSQL Database where you want to load data

Hevo’s will do all the heavyweight lifting to ensure that your data is securely moved from Google Analytics to PostgreSQL. Hevo automatically handles all the schema changes that may happen at Google Analytics’ end. This ensures that you have a dependable infrastructure that delivers error-free data in PostgreSQL at all points.

In addition to Google Analytics, Hevo natively integrates with tons of other applications (Databases, Marketing and Sales Applications, Analytics Applications, etc.) ensuring that you have a reliable partner to move data to PostgreSQL at any point.

Sign up for a 14-day free trial with Hevo here and experience a hassle-free data integration from Google Analytics to PostgreSQL. 

ETL Data to Redshift, Bigquery, Snowflake

Move Data from any Source to Warehouse in Real-time

Sign up today to get $500 Free Credits to try Hevo!
Start Free Trial