Connect BigQuery to Amplitude Analytics: A Comprehensive Guide 101

on Activate, Amplitude, bigquery datasets, BigQuery Functions, Data Integration, Data Migration, Data Warehouse, Google BigQuery • June 20th, 2022 • Write for Hevo

bigquery to amplitude: FI

Google BigQuery is a cloud-based enterprise Data Warehouse that allows users to run SQL queries quickly and analyze large datasets interactively. BigQuery is a read-only data processing engine based on Google’s Dremel Technology.

Amplitude is a product analytics platform that uses Collaborative Analytics to help businesses track visitors. The platform uses behavioral reports to understand how users interact with products and provides real-time insights to help users get more done.

This article talks about how to Connect Bigquery to Amplitude in a few simple steps. In addition to that, it also describes Amplitude and Bigquery briefly.

Table Of Contents

What is Amplitude?

bigquery to amplitude: amplitude logo
Image Source

Developed in 2012, Amplitude is a product analytics platform used for mapping the customer’s journey on websites and applications. Product Analysis is a process to understand how customers get engaged with your products or services. Although Amplitude is used mainly for the Saas (Software as a Service) businesses that host an application online and make it available for the customers through subscriptions, you can use it for almost every business.

For example, if you own an e-commerce website, Amplitude can keep track of your customers’ purchases and the products they have added to their carts. You can notice their behavior by analyzing how many customers have added your products to their cart or have purchased your products after signing up for your website. As a result, Amplitude helps businesses to understand interactions between customers and products in real-time. Such capabilities allow organizations to identify shortcomings in products by analyzing user experience and eliminating bottlenecks by optimizing the website or application.

Key Features of Amplitude

  • Event Segmentation: Event segmentation in Amplitude is used to create charts from events that show what your users are doing with your products. It can help you build analysis on:
    • Measuring the top events performed over a selected period.
    • Analyzing how often the events are fired.
    • Detecting the count of users who is firing events in your product.
    • Clarify those users who are firing events.
  • Event Module: Events in Amplitude are actions taken in your product. Events could be like pushing a button, completing a level, or making a payment. To know how the users are involved in your product, you should be able to aim to track between 15 and 200 events. Amplitude can also track inactive events. Inactive events are actions that are not taken directly by the end-users but still occur within the websites, like push notifications.
  • Retention Analysis: Retention Analysis in Amplitude refers to analyzing how customers return to your product after taking a particular action. However, with Amplitude, you can also have the privilege of not choosing a particular event for retention. Amplitude can compute retention data by comparing the date of the start event to the end event. The Retention analysis chart in Amplitude shows the percentage of active users who have fired your interested events within a specified time like a month, week, or daily.
  • User Cohorting: User Cohorting in Amplitude helps you to identify the behavior of your customers. Cohorts are a group of customers who share some common traits or set of traits. Cohorts are defined by the customer’s actions within a specified period. Different customers who fired the same events are grouped in Cohorts. For example, suppose you own an OTT platform, users who watched 4 to 5 consecutive episodes of a particular series after signing in fall under the same cohorts.

Seamlessly Sync Data from Warehouse to Business Applications Using Hevo Activate

With Hevo Activate, you can empower business teams with 360° customer profiling — sync customer and product usage data available in Data Warehouses, such as Amazon Redshift, Snowflake, and BigQuery, to most common Business Applications like Salesforce, Google Sheets, HubSpot, and Intercom, etc., in a hassle-free & effortless manner.

GET STARTED WITH HEVO ACTIVATE FOR FREE

Business Teams can now make faster and smarter decisions with Hevo Activate by analyzing user journeys and creating personalized experiences for the customers.

Try our 14-day full access free trial today to experience an entirely automated hassle-free Data Sync!

What is Google BigQuery?

bigquery to amplitude: bigquery logo
Image Source

Google BigQuery is a highly scalable, serverless, and multi-cloud Data Warehouse that uses a build-in query engine. It is a highly scalable serverless, fully-featured, fully manageable Data Warehouse that enables scalable analysis over petabytes of data. It is developed by Google and launched on 19th May 2010. It is designed such that it uses the processing power of Google’s infrastructure that makes a single SQL query to analyze petabytes of data in seconds.

BigQuery is also called SQL-based Data Warehouse as a Service (DWaaS) with zero infrastructure management. It is a serverless warehouse that does not require any upfront hardware provisioning or management.  BigQuery runs SQL Queries and all requests are to be authenticated. Google provides a complete package to their users with big data loading features on Google Cloud Storage and connectivity with various Google apps like Apps Script. Google BigQuery has many built-in features like Machine Learning and AI Capabilities, Geospatial Analysis, and Business Intelligence. 

bigquery to amplitude: bigquery working
Image Source

Key Features of Google BigQuery

  • Big Query is a serverless warehouse with zero infrastructure management.
  • Big Query offers infinite storage with high performance in analyzing petabyte-sized datasets in seconds. 
  • Query languages like SQL allow users to query data through Big Query.
  • The big query can analyze Big Data across multiple cloud platforms with BigQuery Omni.
  • BigQuery offers built Machine Learning and Artificial Intelligence capabilities.
  • Real-time analytics is possible with Big Query. It integrates real-time data quickly and efficiently and makes analysis available for business.
  • Data Visualization is available in a variety of graphic formats.
  • BigQuery provides information about location and mapping using BigQuery Geographic Information Systems (GIS).

Connect BigQuery to Amplitude 

To connect BigQuery to Amplitude an indirect method is used, first, you connect BigQuery to CSV and then CSV to Amplitude.

BigQuery to Amplitude: Connect BigQuery to CSV

To connect BigQuery to Amplitude, you first connect BigQuery to CSV and that can be done in the following ways:

Method 1: BigQuery Console Export

This method is used as an indirect method to connect BigQuery to Amplitude. The Web UI, also known as the console, which you can find here, is by far the easiest way to export your data to a CSV file. This is an excellent choice if you only need to export your data once. To begin, double-check that you’re in the correct project, which you can do by looking to the left of the search bar at the top of the screen. After that, you can proceed to the following steps: 

  • Step 1: Expand the right project and Dataset in the Explorer panel. Choose which table you’d like to export from the drop-down menu.
  • Step 2: Click Export, then Export to GCS (Google Cloud Storage) on the far right.
  • Step 3: Choose the GCS bucket and Filename you want to export the data to from the menu that appears, such as yourbucket/yourfolder/yourdata.csv. If necessary, you can use the browse option to find the right bucket, or you can create one first. You can save the file in a format other than CSV or compress it with GZIP to make it smaller.
  • Step 4: Go to GCS and navigate through the buckets until you find the CSV file you saved. A download icon appears to the right of the filename.

You should now have a CSV file with the data you need to connect BigQuery to Amplitude.

Why Hevo’s Reverse-ETLing Capabilities are Best-in-class

Hevo Activate, the Reverse-ETL platform, automates the manual data enrichment job, directly syncing data to the destination of your choice in real-time. Unify customer & product data from data warehouses such as Amazon Redshift, Snowflake, and BigQuery to SaaS Software such as Salesforce, Google Sheets, HubSpot, and Intercom, etc., in a hassle-free & automated manner.

Here’s how Hevo is different:

  • Real-Time Data Replication: Hevo Activate, with its strong integration with various data sources, allows you to transfer data quickly & efficiently. This ensures efficient utilization of bandwidth on both ends.
  • No APIs, CSVs, Just SQL!: Use the powerful and smart SQL Query Editor to filter and transform data from your data warehouse table to a Target object. The in-built validation checks ensure each query is validated before you save it.
  • Secure: Hevo Activate has a fault-tolerant architecture that ensures that the data is handled safely and consistently with zero data loss.
  • On-Demand Sync: Hevo Activate gives users the option to resume the sync or run sync now to execute data sync as per their demand.
  • Intelligent Data Type Conversion: During the mapping action, Hevo Activate transforms the field types of the synced data automatically. 
  • Data Transformation: Hevo Activate provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Live Support: The Hevo Activate team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-day free trial!

Method 2: bq Command-line Tool

  • This is a way to connect BigQuery to Amplitude using a Command-line Tool. BigQuery has its Command-line Tool, bq, which you can use in the console or from a local terminal after installing the Google Cloud SDK. You can find it in the console by clicking here or on the terminal icon on the right side of the search bar. While it isn’t as simple as the previous option, once you’ve memorized the commands, it can be faster than clicking through several menus. The following is the basic syntax for exporting a CSV file:
bq extract [--FLAGS] RESOURCE DESTINATION
  • The table you’re exporting is RESOURCE, and the bucket and location in GCS to which you’re exporting is DESTINATION. FLAGS are optional and allow you to do the following:
    • If applicable, select a Compression Type.
    • Select a File Format, with CSV as the default.
    • For CSV files, select a Delimiter.
    • Choose whether or not to include the header, with TRUE being the default.
  • The command would look like this if you used all of the available flags to export the well-known Austin bike-sharing data from BigQuery’s public dataset.
bq extract --compression=GZIP --destination_format=CSV --field_delimiter=tab --print_header=false bigquery-public-data:austin_bikeshare.bikeshare_stations gs://your_bucket/your_folder/your_file.csv.gzip
  • The flags’ accepted values are listed in the reference documentation. The CSV file will be exported to your GCS using the extract command. If you have the Google Cloud SDK installed, you can download the file from your GCS using the command below in your local terminal.
gsutil cp gs://your_bucket/your_folder/your_file.csv.gzip /your_local_folder
  • If you frequently need to export BigQuery data, and especially if you’re familiar with cron jobs, the Google Cloud SDK can help.

Method 3: Client Library

  • Client Libraries can be used as a method to Connect BigQuery to Amplitude. The ability to use the Client Library to export CSV files If you’re familiar with BigQuery, you’ll know that you can use a client library to access the BigQuery API in a variety of languages, including Node.js, Java, and Python.
  • This option requires the most time and effort to set up, but once it is up and running, it can be seamlessly integrated into an automated workflow. Google has a quick start guide to help you get started with your first search. You can export your data once you’re ready to run your first query by adapting the example (Python) code below to your needs.
# Import library
from google.cloud import bigquery

# Declare client
client = bigquery.Client()
  • If you get a DefaultCredentialsError when declaring a Client, you can easily fix it with this solution.
# Parameters for your case 
project = 'bigquery-public-data'
dataset_id = 'austin_bikeshare'
table_id = 'bikeshare_stations'
table_location = 'US' # Open the table in the BigQuery web UI, go to the details tab and look for 'Data location' 

bucket_name = 'your_bucket'
file_location = 'your_folder/austin_bikeshare.csv'
destination_uri = f'gs://{bucket_name}/{file_location}'
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table(table_id)

# Exporting the entire table
extract_job = client.extract_table(
    table_ref,
    destination_uri,
    location = table_location,
)

extract_job.result() 
  • The entire table has been saved as a CSV file in GCS. The file can be downloaded using the code below.
from google.cloud import storage

bucket_name = 'your_bucket'
source_file_name = 'austin_bikeshare.csv'
destination_file_name = 'austin.csv'

storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_file_name)

blob.download_to_filename(destination_file_name)
  • You can also use a standard SQL query to limit the data exported and save the result in a CSV file without using GCS.
# Querying a subset of the table
sql_query = ("""
	SELECT station_id, name, status 
	FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
    	WHERE status = 'active'
	""")


# Storing the data in a pandas DataFrame
df = client.query(sql_query).to_dataframe()

# Saving the DataFrame to a CSV
df.to_csv('filename.csv')
  • Then, once you’ve run the code above, you’re ready to continue your data journey with a CSV in your hand (or file structure).

BigQuery to Amplitude: Connect CSV to Amplitude

Next, to connect BigQuery to Amplitude, you need to connect CSV to Amplitude. To use the segment csv importer to import data into Amplitude, you’ll need two things.

  • In CSV format, save your historical data.
  • Make a key (find in your segment account)

Follow the steps below to upload your data.

  • Step 1: Go to the Segment CSV Importer website.
bigquery to amplitude: csv to amplitude segment csv importer
Image Source
  • Step 2: In the Write field, type in the Write key.
  • Step 3: To upload your data-filled CSV file, select it from the drop-down menu. Click the Import button after you’ve uploaded the file. After that, the data will be sent to Segment, which will then forward it to Amplitude. The following fields could be present in your CSV file:
    • Action
    • User ID
    • Anonymous ID (either user id or anon id is required)
    • Timestamp
    • Events
    • Properties
  • Step 4: You can send any information about the user as a person’s property or track their previous actions. It also includes a few sample CSV files to assist you in getting started quickly.
  • Step 5: The data will be displayed in amplitude once you upload this file to Segment CSV Importer. With this, you have successfully connected BigQuery to Amplitude.

Conclusion

This article explains the simple steps to connect BigQuery to Amplitude. It also gives an overview of Bigquery and Amplitude.

visit our website to explore hevo

Hevo Activate, a Reverse ETL Platform provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations, with a few clicks. Hevo Activate with its strong integration with other sources like Google BigQuery allows you to not only export data from your desired data sources & load it to the destination of your choice.

Want to take Hevo for a spin? 

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Sync Data from Google BigQuery to Business Applications Seamlessly