Google Analytics Data Mining Simplified 101

on API, Data Analytics, Google Analytics, Python • May 23rd, 2022 • Write for Hevo

E-commerce has evolved in many ways, and it has changed the way we shop and business people do their business. Mostly offline stores are also launching their businesses online as it gives them a platform to expand and grow. E-commerce sites use digital marketing to promote their services, increase sales, and improve ROI(Return on Investment). For online companies, it’s essential to extract their website data and analyze it for future growth.

Google Analytics plays a very vital role in data analysis of websites. It gives a free platform for companies to analyze their current situation and implement tactics to grow further. In this article, you will learn about the Google Analytics Data Mining process and the steps to transform this data into functional analysis and reports.

Table of Contents

What is Google Analytics?

Google Analytics Data Mining - Google Analytics Logo
Image Source

Google Analytics is a web analytics service offered by Google that measures website traffic and creates analysis reports. Google Analytics service comes under Google Marketing Platform brand. Google launched Google Analytics on November 14, 2005. Google Analytics is used to track website activity, such as the duration of each session, pages reached per session, the bounce rate of individuals using the site, and the source of the traffic.

Google Analytics is a primarily used free web analytics tool. It provides in-depth insight into your website and business’s online performance. It can be integrated with Google Ads to launch online campaigns to promote and sell their products and increase traffic on your website. It offers a wealth of data that companies can use to evaluate their website performance. It helps them plan for an effective digital marketing strategy and change tactics to achieve the best results.

Google Analytics can be used for both websites and mobile apps. It analyzes website data and creates customized reports as per business needs.

Replicate Google Analytics Data in Minutes Using Hevo’s No-Code Data Pipeline

Hevo Data, a Fully-managed Data Pipeline platform, can help you automate, simplify & enrich your data replication process in a few clicks. With Hevo’s wide variety of connectors and blazing-fast Data Pipelines, you can extract & load data from 100+ Data Sources (including 40+ Free Data Sources such as Google Analytics, and Google Analytics 360 ) straight into your Data Warehouse or any Databases.

To further streamline and prepare your data for analysis, you can process and enrich raw granular data using Hevo’s robust & built-in Transformation Layer without writing a single line of code!

Get Started with Hevo for Free

Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial today to experience an entirely automated hassle-free Data Replication!

Features of Google Analytics

The main features of Google Analytics are:

Traffic Measurement 

Google Analytics Data Mining - Traffic Measurement
Image Source

Traffic measurement is the primary and most common report generated by Google Analytics. It shows the number of people visiting your site every day. If we fo in detail, it also shows peak hours, which helps analyze trends over time and seasons, which ultimately helps in making digital marketing decisions.

User Activity and Conservation

User activity, like the page most visited by users, can be tracked easily with Google Analytics. It also helps in tracking the conversion ratio of the users. Like how many filled out the contact forms or subscription forms and bought products online.

Audience Reports

Google Analytics Data Mining - Audience Reports
Image Source

Audience reports show a fascinating picture of users, like their geographic location, gender, and critical browsing behaviors. User reports also show users’ interests in different product categories. Business owners can use it to attract new and old users for more purchases.

Flow Visualization Reports

Flow visualization report tracks every user step. From the starting page to how users explore your website and backtrack. It helps analyze which interests the users most and where exactly they are losing their interest in your website. There are many types of Flow Visualization Reports generated by Google Analytics, and one of them is a behavior tracking report, 

Custom Reports 

As cleared from naming, these are customized reports created according to business needs. Google Analytics offers many templates to create custom reports. It saves lots of time with already made insight reports on mobile performance, page timing, keyword analysis, etc.

What is Data Mining?

Google Analytics Data Mining - Data Mining
Image Source

In laypeople’s language, Data Mining is the extraction of valuable data from big data of different websites. It uses mathematics, statistics, and Machine learning algorithms to extract useful information and convert it into insights. Data Mining separates noise from the data and concentrates its efforts on valuable data.

Data mining can be termed similar to data science, where people in specific scenarios perform with goals for specific datasets. This process includes various types of services such as text mining, web mining, audio, and video mining, image data mining, and social media mining.

Stages of Data Mining

Google Analytics Data Mining - Stages of Data Mining Process
Image Source

There are four stages of the Data Mining Process:

  • Data Gathering: Data gathering is the first and crucial step of Data Mining. It is essential to identify the correct data to be gathered. It takes lots of effort to collect accurate data from various sources. It is vital to merge all data in the same format.
  • Data Pre-Processing: Data Preprocessing takes almost 90% of the time in data mining. There are many steps in data preprocessing- Data cleaning, noise removal from data, handling data outliers and missing data, and transforming data to prepare it for analysis.
  • Data Analysis: After preprocessing, data is ready for analysis. Various graphs and reports are created with data to discover important information. These graphs and reports help in making crucial business decisions. 
  • Data Interpretation: Based on data analysis, data is interpreted for better understanding. It means giving meaning to data analysis according to the relevant context.

What Makes Hevo’s ETL Process Best-In-Class

Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo’s automated platform empowers you with everything you need to have for a smooth data replication experience.

Check out what makes Hevo amazing:

  • Fully Managed: Hevo requires no management and maintenance as it is a fully automated platform.
  • Data Transformation: Hevo provides a simple interface to perfect, modify, and enrich the data you want to transfer.
  • Faster Insight Generation: Hevo offers near real-time data replication so you have access to real-time insight generation and faster decision making. 
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources like Google Analytics, Google Analytics 360 (with 40+ free sources) that can help you scale your data infrastructure as required.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

Google Analytics Data Mining using Python API

Today’s Data Scientists and Data Analysts’ most used languages are Python, and R. Google Analytics can use both languages to extract data. Here, in this article, we will use Python API to carry out the Google Analytics Data Mining Process. Follow the below steps in the mentioned order for a smooth Google Analytics Data Mining process using API in Python:

Google Analytics Data Mining Step 1: Python Setup

The first step for the Google Analytics Data Mining process is installing and setting up Python with Anaconda on our system. Follow the link to install and set up Python with Anaconda on the Windows system.

Google Analytics Data Mining Step 2: Project Creation and Enable API in Google Developer Console

After a successful Python setup in the previous step of the Google Analytics Data Mining process, you need to create a project in Google Developer Console. After creating a project, you need to enable two APIs, Google Analytics Reporting API, and Analytics API. Enabling Google Analytics API leads to obtaining client IDs and client secrets. Download the JSON file with the following details:

Google Analytics Data Mining - Project Creation and Enabling API Code
Image Source – Self

The next Google Analytics Data Mining step is to obtain the service account key and download it for future use. Finally, the next step for the Google Analytics Data Mining process is to grant service account access to Google Analytics.

For the detailed procedure of each step, follow the link.

Google Analytics Data Mining Step 3: Install Google API Client in Python

In the command prompt window, run the following pip command to install the Google API client for python and install oauth2client.

pip install --upgrade google-api-python-client oauth2client

After successful installation, you will receive the following messages:

Google Analytics Data Mining - API installation confirmation message
Image Source – Self
Successfully installed oauth2client-4.1.3

Google Analytics Data Mining Step 4: Connect Python to Google Analytics Reporting API

Now, it’s time to connect Python to Google Analytics Reporting API. First, rename your JSON key file to client_secrets.json and save it in the current working folder.

Open a Jupyter Notebook and run the below python code:

#Load Libraries
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.discovery import build
import httplib2

#Create service credentials
credentials = ServiceAccountCredentials.from_json_keyfile_name('client_secrets.json', ['https://www.googleapis.com/auth/analytics.readonly'])
 
#Create a service object
http = credentials.authorize(httplib2.Http())
service = build('analytics', 'v4', http=http, discoveryServiceUrl=('https://analyticsreporting.googleapis.com/$discovery/rest'))

Once the connection is made successfully with Google Analytics Reporting API, move to the next step of the Google Analytics Data Mining process.

Google Analytics Data Mining Step 5: Make First API Call

In this step of the Google Analytics Data Mining process, you need to get the view ID from our authorized service account of Google Analytics to make the first API call. To extract the View ID, go to Google Analytics > Admin > View > View Settings, and from here copy the View ID. Enter the View ID in the below Python code and run it to make the first API call.

response = service.reports().batchGet(body={
        'reportRequests': [
            {
                'viewId': 'XXXXXXXXXXXX', #Add the copied View ID                 'dateRanges': [{'startDate': '30daysAgo', 'endDate': 'today'}],
                'metrics': [{'expression': 'ga:sessions'}],
                'dimensions': [{"name": "ga:pagePath"}], #Get Pages
                "filtersExpression":"ga:pagePath=~products;ga:pagePath!@/translate", #Filter by condition "containing products"
                'orderBys': [{"fieldName": "ga:sessions", "sortOrder": "DESCENDING"}],
                'pageSize': 100
            }]
    }
).execute()

To understand the Google Analytics Data Mining code better and learn more about filtering, go to Google Documentation Page.

Google Analytics Data Mining Step 6: Parsing The Report Data

The next step for the Google Analytics Data Mining process is to extract data to use it further in the analysis. Firstly, we need to create an empty list to store data. Then, read the response we get from the earlier step and extract the data required.

#create two empty lists to hold our dimensions and sessions data
dim = []
val = []
 
#Extract Data
for report in response.get('reports', []):
 
    columnHeader = report.get('columnHeader', {})
    dimensionHeaders = columnHeader.get('dimensions', [])
    metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', [])
    rows = report.get('data', {}).get('rows', [])
 
    for row in rows:
 
        dimensions = row.get('dimensions', [])
        dateRangeValues = row.get('metrics', [])
 
        for header, dimension in zip(dimensionHeaders, dimensions):
            dim.append(dimension)
 
        for i, values in enumerate(dateRangeValues):
            for metricHeader, value in zip(metricHeaders, values.get('values')):
                val.append(int(value))

Google Analytics Data Mining Step 7: Store Data in DataFrame and Export to CSV

In the last step of the Google Analytics Data Mining process, you can easily store the data in a Python data frame and then exported it to a CSV file. Furthermore, it is easy to handle and analyze when stored in the data frame.

#Create new dataframedata_fr = pd.DataFrame()
data_fr["Sessions"]=val
data_fr["pagePath"]=dim
data_fr=data_fr[["pagePath","Sessions"]]
#Export dataframe to CSVdata_fr.to_csv("page_by_session.csv")

Once you have data stored in CSV or data frame, you can easily do advanced analytics and data visualizations

Conclusion

In this article, you have learned how to effectively carry out the Google Analytics Data Mining process. At first glance, it’s tricky to mine data from Google Analytics, but once you understand and are hands-on with it, it is super fun. Next, try to extract data from your website and explore its future with different analytics and visualizations. Data Visualizations and various reports will help you understand your website better and give a new dimension to your business.

As you collect and manage your data across several applications and databases in your business, it is important to consolidate it for complete performance analysis. However, it is a time-consuming and resource-intensive task to monitor the Data Connectors continuously. To achieve this efficiently, you need to assign a portion of your engineering bandwidth to Integrate data from all sources, Clean & Transform it, and finally, Load it to a Cloud Data Warehouse, BI Tool, or a destination of your choice for further Business Analytics. All of these challenges can be comfortably solved by a Cloud-based ETL tool such as Hevo Data.   

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline can transfer real-time data from a vast sea of 100+ sources like Google Analytics to a Data Warehouse, Databases, BI Tool, or a Destination of your choice. It is a reliable, completely automated, and secure service that doesn’t require you to write any code!  

If you are using Google Analytics your Web Analytics platform and searching for a no-fuss alternative to Manual Data Integration, then Hevo can effortlessly automate this for you. Hevo, with its strong integration with 100+ sources and BI tools (Including 40+ Free Sources like Google Analytics 4 and Google Analytics 360), allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiffy.

Want to take Hevo for a ride? Sign Up for a 14-day free trial and simplify your Data Integration process. Do check out the pricing details to understand which plan fulfills all your business needs.

Tell us about your experience of carrying out the Google Analytics Data Mining process! Share your thoughts with us in the comments section below.

No-code Data Pipeline For Google Analytics