Connecting BigQuery to Python: Easy Guide

on Data Integration, Tutorials • October 9th, 2020 • Write for Hevo

Are you looking to move your data by connecting BigQuery to Python? This guide will let you do exactly that. Read on to know more about BigQuery, why you should connect it to Python and how to do that.

Here’s what you will be looking at:

Introduction to Google BigQuery

Google BigQuery is a completely managed cloud-based data warehouse that can process petabytes of data. A completely managed data warehouse allows the customers to focus on their core business problems rather than spending time in developing and maintaining their own custom data warehouse. Other than the standard data warehouse features like a comprehensive SQL layer and super fast querying abilities, BigQuery provides a complementary suite of accompanying services that further enhances its value proposition. 

BigQueryML helps customers to run machine learning models through its simple SQL language. Connected sheets allow data from BigQuery to be analyzed through Google sheets. A recent interesting addition, even though in the alpha state has been the BigQueryQnA service that allows querying in natural language constructs. BigQuery’s data processing abilities go beyond data stored in its own storage classes. It can also process data from many external sources like Google cloud storage, Big table, Google Cloud SQL, Google Drive, etc. Recent developments also allow it to run on other cloud providers through BigQuery Omni. This post is about how to use Python to connect to BigQuery and process data.

Why Connect BigQuery to Python?

Organizations that use BigQuery for their data warehousing needs often need to access the data from BigQuery using programming languages. The typical requirement for this emerges from the need to transform data into various forms suitable for consumption and for moving data to other databases for specific use cases. Python is a very popular choice language choice for such needs, because of its data manipulation abilities and ease of integration with data processing frameworks like Spark. 

Hevo: Migrate your Data Easily

Hevo is a No-code Data Pipeline. It supports pre-built data integrations from 100+ data sources, including Google BigQuery. Hevo offers a fully managed solution for your data migration process. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully-automated solution to manage data in real-time and always have analysis-ready data in your desired destination.

Let’s look at some salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and maps it to the destination schema.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support call.

Explore more about Hevo by signing up for a 14-day free trial today

Connecting BigQuery to Python

Google provides libraries for most of the popular languages to connect to BigQuery. The list of supported languages includes Python, Java, Node.js, Go, etc. The first step in connecting BigQuery to any programming language is to go set up the required dependencies. You will begin this tutorial by installing the python dependencies 

  1. Install the Python BigQuery dependency as follows.
pip install --upgrade google-cloud-BigQuery
  1. You will now go to the Google cloud service account page and set up a service account to access BigQuery from external libraries. Click on the new service account and provide a name for the account. Ensure that you select the role as owner or editor. The account identifier will be prefilled automatically.
Connecting BigQuery to Python
  1. Click on create and you will notice the browser prompting you to download a JSON file. Download the file and keep it for future use. You will use the path of the saved file in the next step.
  2. You will now setup te\he environment variables for the Python script to use while accessing BigQuery. 
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/my-key.json"

Ensure that the path to the credential file is replaced with your original path in the above command.

  1. You will now use the python client library to create a simple script to access data from one of the public data sets available in BigQuery. The first step is to get the imports right.
from google.cloud import BigQuery
  1. Initialize the client as below.
BigQuery_client = BigQuery.Client()
  1. Form the query as follows. The query will access a public data set in BigQuery that has data about names in the USA. Our query will group the names and find the count of each name.
name_group_query = """
    SELECT name, SUM(number) 
    FROM `BigQuery-public-data.usa_names.usa_1910_2013`
    GROUP BY name, state
    ORDER BY total_people 
    LIMIT 100
"""
  1. Initiate the job as follows. A job in BigQuery is nothing but a query execution. Since query executions are long-running in some cases, they are addressed using the term job.
query_results = BigQuery_client.query(name_group_query)  
  1. The last step is to print the result of the query using a loop.
for result in query_results:
    print(str(result[0])+”,”+str(result[1]))

The above loop will print the name and count of the names separated by a comma. 

That ends the step involved in connecting Google BigQuery to Python. A typical use case where you will need to use this approach is when you need to move data from BigQuery to another database or to schedule an extraction process. Even though this approach may serve the use case, it comes with a number of challenges.

  1. Most such requirements come with the need for scheduling and it can become a time-consuming affair to build a reliable scheduler.
  2. Most such extractions will require duplicates and deletions to be handled at the destination database end. It can be a pain to build such logic using queries and python logic.

If those challenges feel like too much work, you may find solace in Hevo–  A completely managed cloud-based ETL tool. Hevo can connect to BigQuery and move data to most of the common databases. Hevo possesses excellent scheduling abilities and can transform data on the fly.

It supports pre-built integrations from 100+ data sources at a reasonable price. With Hevo, you can integrate Google Cloud Platform with your desired destination in minutes.

Explore more about Hevo by signing up for a 14-day free trial today.

Have any further queries? Get in touch with us in the comments section below.

No-Code Data Pipeline for BigQuery