Google Analytics to PostgreSQL: 2 Easy Methods

on Tutorial • February 5th, 2020 • Write for Hevo

Google Analytics is a one-stop solution for website owners who want to have comprehensive information on all user activities happening on a website. Google analytics provides unique insights about users and allows the owners to measure and monitor the performance of their marketing, content, and products.

It also enables users to take advantage of the built-in advanced algorithms to make the most of their data. Even though Google provides a comprehensive set of analysis tools to work with the data, most organizations will need to pull in the raw data to their on-premise database. This is because having it in their control allows them to combine with their own customer and product data to perform much deeper analysis.

This post is about importing data from google analytics to PostgreSQL – one of the very popular relational databases in the market today. This blog covers the first approach (building custom ETL scripts) extensively. Additionally, the article also highlights the challenges and limitations of this approach. Towards the end, the post also discusses simpler ways of achieving the same outcome.

Table of Contents

Introduction to Google Analytics

GA Logo
Image Source

Google Analytics helps companies keep track of user details and advertising campaigns using a simple javascript plugin. It provides an excellent reporting dashboard and the ability to run various analyses on the user activity data. It even supports advanced machine learning capabilities and helps organizations to deploy this capability to make better decisions without having to know the concepts or implementation details of machine learning algorithms.

Google Analytics also makes raw user activity data available to the companies that want to run custom algorithms and reports. This requires the raw clickstream data to be loaded into the organization’s own data store. 

Here are a few key benefits of Google Analytics:

  • Understanding User Behavior: The key advantage of having behavioral metrics is that it provides you valuable information on what pages get the most engagement and traction. By having a better understanding of User Behavior, you can change the way you can interact with your users in the most optimized way.
  • Easy to Find Your Target Audience: With the help of Google Analytics, you can easily define your target audience. The audience can then be used to optimize the content of your website along with its offerings to increase engagement. Thus, an engaged audience is one of the best ways to improve a website’s potential.
  • Data Reports and Customization: Google Analytics allows you to customize dashboards, alerts, and reports to analyze data that can fit every company’s different needs. Google Analytics provides an extensive library of user-generated reports and dashboards that can help you make data-driven decisions to improve efficiency.
  • Track Online Traffic: Google Analytics allows you to track traffic from all sources since having an understanding of where your audience comes from is a crucial aspect of running a business online. Pinpointing different traffic sources and understanding why and how much traffic comes to your website allows you to track the gains of your strategies.

Simplify Data Automation and ETL using Hevo’s No-code Data Pipeline

Hevo is a No-code Data Pipeline that offers a fully managed solution to set up data integration from 100+ data sources (including 30+ Free Data Sources like Google Analytics) and will let you directly load data to a Data Warehouse/Desired Destination and visualize it in a BI tool of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data.

Get Started with Hevo for Free

Check out what makes Hevo amazing:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with minimal latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.

Simplify your data analysis with Hevo today!

Sign up here for a 14-Day Free Trial!

Introduction to PostgreSQL

Postgres Logo
Image Source

Postgres is a popular relational database that uses a variant of SQL in its operations. Postgres can store and scale different workloads, making it a very versatile relational database. This versatility makes Postgres very useful in many instances, from use for small applications to large-scale data warehousing. Postgres also has an architecture that encourages extensibility. Thus, enabling you to incorporate code from other languages without recompiling the database. Its extensibility also enables you to define your own data types. Postgres is also ACID-compliant and able to run on all the major operating systems.

Key features of Postgres

  • ACID-Compliant: Postgres is ACID-compliant and so ensures that your transactions are handled in a timely and efficient manner.
  • Open Source: Postgres is fully open-source and has an active community to help with efficiency tips, bug resolution, etc.
  • Extensibility: Postgres is very customizable and so can more readily incorporate new functionality into its system.

Two ways to move data from Google Analytics to PostgreSQL

Method 1: Using Manual ETL Scripts to Connect Google Analytics to PostgreSQL

You could invest precious engineering resources to build custom ETL scripts to extract data from Google Analytics API and load it to PostgreSQL. Once this infrastructure to move data from Google Analytics to PostgreSQL is set up, you would need to provision tech bandwidth to maintain and monitor the same so that you have consistent and reliable data in PostgreSQL at any point.

Method 2: Using Hevo Data to Connect Google Analytics to PostgreSQL

A ready-to-use Data Pipeline Platform such as Hevo would come with native integrations to both Google Analytics (a free source) and PostgreSQL. Implementing a platform like Hevo would allow you to bring data from Google Analytics to PostgreSQL reliably in just a few minutes without the need for tech bandwidth. This, in turn, will reduce the project timelines drastically and provide swift access to Google Analytics data.

Understanding the Methods to Connect Google Analytics to PostgreSQL

These are the methods that you can use to connect Google Analytics to PostgreSQL in a seamless fashion:

Method 1: Using Manual ETL Scripts to Connect Google Analytics to PostgreSQL

In this method of moving data from Google Analytics to PostgreSQL, you will first need to get data from Google Analytics followed by accessing Google Reporting API V4 as mentioned in the following section.

Getting data from Google Analytics

Click event data from Google Analytics can be accessed through Reporting API V4. There are two sets of Rest APIs in Reporting API V4 tailor-made for specific use cases.

  • Metrics API – These APIs allow users to get aggregated analytics information on user behavior based on available dimensions. Dimensions are the attributes based on which metrics are aggregated. For example, if time is a dimension and the number of users in a specific time will be a metric.
  • User Activity API – This API allows you to access information about the activities of a specific user. Knowledge of the user-id is required in this case. To get user ids of people accessing your page, you will need to modify some bits in the client-side google analytics function that you are going to use and capture the client id. This information is not exactly available in the google developer documentation but there is ample documentation on the internet about it. Ensure you consult the laws and restrictions in your local country before attempting this since the legality of this will depend on the privacy laws of the country. After changing the client script, you will also need to register the user id as a custom dimension in the Google Analytics dashboard.

Google Analytics APIs use oAuth 2.0 as the authentication protocol. Before accessing the APIs, the user first needs to create a service account in the Google Analytics dashboard and generate authentication tokens. Let us review how this can be done. 

  • Go to the google service accounts page and select a project. If you have not already created a project, please create one.
  • Click on Create Service Account.
  • You can ignore the permissions for this exercise.
  • On the ‘Grant users access to this service account’ section, click Create key.
  • Select JSON as the format for your key.
  • Click create a key and you will be prompted with a dialogue to save the key on your local computer. Save the key.

We will be using the information from this step when we actually access the API.

Accessing Google Reporting API V4

Google provides easy-to-use libraries in Python, Java, and PHP to access its reporting APIs. These libraries are the preferred method to download the data since the authentication procedure and the complex JSON response format makes it difficult to access these APIs using command-line tools like CURL. Detailed documentation of this API can be found here. Here the python library is used to access the API. The following steps and code snippets explain the procedure to load data from Google Analytics to PostgreSQL:

Step 1: Installing the Python GA Library to Your Environment

sudo pip install --upgrade google-api-python-client

Before this step, please ensure the python programming environment is already installed and works fine. We will now start writing the script for downloading the data as a CSV file.

Step 2: Importing the Required Libraries

from apiclient.discovery import build

from oauth2client.service_account import ServiceAccountCredentials

Step 3: Initializing the Required Variables for OAuth Authentication

credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)

# Build the service object.
 analytics = build('analyticsreporting', 'v4', credentials=credentials)

Replace the key file location and view id with what we obtained in the first service creation step. View ids are the views from which you will be collecting the data. To get the view id of a particular view that you have already configured, go to the admin section, click on the view that you need, and go to view settings.

Step 4: Building the Required Objects

credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)#Build the service object

analytics = build('analyticsreporting', 'v4', credentials=credentials)

Step 5: Executing the Method to Get Data

In this step, you need to execute the method to get the data. The below query is for getting the number of users aggregated by country from the last 7 days.

response = analytics.reports().batchGet(body={

'reportRequests': [

{

'viewId': VIEW_ID,

'dateRanges': [{'startDate': '7daysAgo', 'endDate': 'today'}],

'metrics': [{'expression': 'ga:sessions'}],

'dimensions': [{'name': 'ga:country'}]

}]

}

).execute()

Step 6: Parsing JSON and Writing the Contents to a CSV File

import pandas as pd from pandas.io.json 
import json_normalize

reports = response['reports'][0]
columnHeader = reports['columnHeader']['dimensions'] 
metricHeader = reports['columnHeader']['metricHeader']['metricHeaderEntries'] columns = columnHeader for metric in metricHeader: 
 columns.append(metric['name']) 
data = json_normalize(reports['data']['rows']) 
data_dimensions = pd.DataFrame(data['dimensions'].tolist()) 
data_metrics = pd.DataFrame(data['metrics'].tolist()) 
data_metrics = data_metrics.applymap(lambda x: x['values']) 
data_metrics = pd.DataFrame(data_metrics[0].tolist()) 
result = pd.concat([data_dimensions, data_metrics], axis=1, ignore_index=True)
result.to_csv('reports.csv')

Save the script and execute it. The result will be a CSV file with the following column:

Id , ga:country, ga:sessions

Step 7: Loading CSV File to PostgreSQL

This file can be directly loaded to a PostgreSQL table using the below command. Please ensure the table is already created

COPY sessions_tableFROM 'reports.csv' DELIMITER ',' CSV HEADER;

The above command assumes you have already created a table named sessions_table.

You now have your google analytics data in your PostgreSQL table. Now that we know how to do get the Google Analytics data using custom code, let’s look into the limitations of using this method.

Limitations of using Manual ETL Scripts to Connect Google Analytics to PostgreSQL

  • The above method requires you to write a lot of custom code. Google’s output JSON structure is a complex one and you may have to make changes to the above code according to the data you query from the API.
  • This approach is fine for a one-off data load to MySQL, but in a lot of cases, organizations need to do this periodically and merge the data point every day while handling duplicates. This will force you to write a very complex import tool just for Google Analytics. 
  • The above method addresses only one API that is available for Google analytics. There are many other available APIs from Google analytics that provide different types of data. An example is a real-time API.  All these APIs come with a different output JSON structure and the developers will need to write separate parsers.
  • The APIs are rate limited which means the above approach will lead to errors if complex logic is not implemented to throttle the API calls. 

A solution to all the above problems is to use a completely managed ETL solution like Hevo which provides a simple click and execute interface to move data from Google Analytics to PostgreSQL.

Method 2: Using Hevo Data to Connect Google Analytics to PostgreSQL

hevo logo
Image Source

The best way to bypass the above constraints is to use a Data Pipeline Platform like Hevo (14-day free trial) that works out of the box. Hevo can help you import data from Google Analytics to PostgreSQL for free in two simple steps:

Step 1: Connect Hevo to Google Analytics to set it up as your source by filling in the Pipeline Name, Account Name, Property Name, View Name, Metrics, Dimensions, and the Historical Import Duration.

GA Source Config
Image Source

Step 2: Load data from Google Analytics to Postgresql by providing your Postgresql databases credentials like Database Host, Port, Username, Password, Schema, and Name along with the destination name.

Postgres Destination Config
Image Source

Hevo’s will do all the heavyweight lifting to ensure that your data is securely moved from Google Analytics to PostgreSQL. Hevo automatically handles all the schema changes that may happen at Google Analytics’ end. This ensures that you have a dependable infrastructure that delivers error-free data in PostgreSQL at all points.

Here are a few benefits of using Hevo:

  • Easy-to-use Platform: Hevo has a straightforward and intuitive UI to configure the jobs.
  • Data Transformation: Hevo supports complex transformations for the data on the move.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL databases to name a few.  
  • Real-time Data Transfer: Support for real-time synchronization across a variety of sources and destinations. 
  • Fully Managed: Completely managed service which spares the users of all infrastructure and maintenance activities. 
  • Automatic Schema Mapping: Hevo can automatically detect your source’s schema type and match it with the schema type of your destination. 

Conclusion

This blog talks about the two methods you can deploy to connect Google Analytics to PostgreSQL in a seamless fashion. It also gives a brief introduction to Google Analytics and PostgreSQL that includes its salient features and benefits before jumping into the steps to move data from Google Analytics to PostgreSQL.

Visit our Website to Explore Hevo

In addition to Google Analytics( a Free Source), Hevo natively integrates with tons of other applications (Databases, Marketing and Sales Applications, Analytics Applications, etc.) ensuring that you have a reliable partner to move data to PostgreSQL at any point.

Get started with Hevo today!

Sign Up for a 14 day free trial.

No-code Data Pipeline for PostgreSQL