API to BigQuery: Real-time Data Loading Using Python

on Tutorial • July 19th, 2019 • Write for Hevo

Many businesses today use a variety of cloud-based applications for day-to-day business. This could include:

  1. CRMs like Salesforce, HubSpot, etc.
  2. Marketing Applications like Mailchimp, Sendgrid, etc.
  3. Support Systems like Intercom, Zendesk, etc. 

Companies today are also very keen to combine this data with data from various other sources to measure key metrics that help them grow. Given most of the cloud applications are own and run by third-party vendors – the applications expose their APIs in order to help companies extract the data into a data warehouse – say Google BigQuery. This blog details out the process you would need to follow to move data from API to BigQuery. The blog highlights various methods, their shortcomings along the workarounds. Let’s get started.

Table of contents

Introduction to API

API (Application Programming Interface) is a service that two applications use to communicate with each other. 

For simplicity, assume API to be like the Menu Card in a restaurant. You look at the menu, list down, and communicate the items you want to order. Eventually, you will have your dish delivered to your table. 

Similarly, APIs allow engineers to see the various services/calls that can be done, communicate what they want and in response, API will fetch the relevant result for them. 

You can read more on APIs here.

When a call is made to an API, often it returns the data in JSON or XML format.

Some APIs require authentication (API Key or a Client ID and Client Secret, similar to a username and password, so to speak) to control their usage, and some do not. 

Introduction to Google BigQuery

Fully managed by Google, BigQuery is a cloud data warehouse that is known to store and analyze huge amounts of data in a matter of seconds.

It is cost-effective, highly and easily scalable, serverless, and pretty much works out of the box. This makes it one of the favorite warehouse choices for modern businesses. 

You can read more about BigQuery features here.

Moving Data from API To Google BigQuery

You can use any of the following approaches to move data from API to BigQuery.

Method 1: A code-free Data Integration platform like Hevo Data will help you load data through a visual interface in real-time. You can sign up for a 14-day free trial here to explore this.

Method 2: Hand-code ETL scripts and schedule cron jobs to move data from API to Google BigQuery.

Moving Data from API To Google BigQuery

Method 2: Hand-code ETL scripts and schedule cron jobs to move data from API to Google BigQuery.

Moving Data from API To Google BigQuery

Get Started For Free

This post talks about Method 2 in detail. You will also find the limitations and challenges that you may face in this approach. Towards the end of the post, we have compared the two methods so that you have the right data points to choose a path that suits your needs the best. 

API to BigQuery ETL Using Custom Code

The steps involved in migrating data from API to BigQuery are as follows:

  1. Getting your data out of your application using API
  2. Preparing the Data that was extracted from Application 
  3. Loading data into Google BigQuery

Step 1: Getting data out of your application using API

Below are the steps to extract data from the application using API.

Get the API URL from where you need to extract the data. In this article, you will learn how to use Python to extract data from ExchangeRatesAPI.io which is a free service for current and historical foreign exchange rates published by the European Central Bank. The same method should broadly work for any API that you would want to use.API URL = https://api.exchangeratesapi.io/latest?symbols=USD,GBPIf you click on URL you will get below result:

{      "rates":{         "USD":1.1215,       "GBP":0.9034    },    "base":"EUR",    "date":"2019-07-17" }

Reading and Parsing API response in python:

a. To handle API response will need two important libraries

import requests import json 

b. Connect to URL and get the response

url = "https://api.exchangeratesapi.io/latest?symbols=USD,GBP" response = requests.get(url) 

c. Convert string to JSON format

parsed = json.loads(data)

d. Extract data and print

date = parsed["date"] gbp_rate = parsed["rates"]["GBP"] usd_rate = parsed["rates"]["USD"]

Here is the complete code:

import requests

import json


url = "https://api.exchangeratesapi.io/latest?symbols=USD,GBP"


response = requests.get(url)

data = response.text

parsed = json.loads(data)

date = parsed["date"]


gbp_rate = parsed["rates"]["GBP"]

usd_rate = parsed["rates"]["USD"]

print("On " + date + " EUR equals " + str(gbp_rate) + " GBP")

print("On " + date + " EUR equals " + str(usd_rate) + " USD")

Step 2: Preparing Data received from API

There are two ways to load data to BigQuery. 

  1. You can save the received JSON formated data on JSON file and then load into BigQuery.
  2. You can parse the JSON object, convert JSON to dictionary object and then load into BigQuery.

Step 3: Loading data into Google BigQuery

We can load data into BigQuery directly using API call or can create CSV file and then load into BigQuery table.

  • Create a Python script to extract data from API URL and load (UPSERT mode)  into BigQuery table. Here UPSERT is nothing but Update and Insert operations. This means – if the target table has matching keys then update data, else insert a new record.
import requests 
import json from google.cloud
import bigquery url = "https://api.exchangeratesapi.io/latest?symbols=USD,GBP" 
response = requests.get(url) 
data = response.text 
parsed = json.loads(data) 
base = parsed["base"] 
date = parsed["date"] 
client = bigquery.Client() 
dataset_id = 'my_dataset' 
table_id = 'currency_details' 
table_ref = client.dataset(dataset_id).table(table_id) 
table = client.get_table(table_ref) for key, value in parsed.items():        if type(value) is dict:            for currency, rate in value.items():                QUERY = ('SELECT target_currency FROM my_dataset.currency_details where currency=%', currency)                query_job = client.query(QUERY)                               if query_job == 0:                    QUERY = ('update my_dataset.currency_details set  rate = % where currency=%',rate, currency)                    query_job = client.query(QUERY)                else:                    rows_to_insert = [                    (base, currency, 1, rate)                    ]                    errors = client.insert_rows(table, rows_to_insert) assert errors == []
  • Load JSON file to BigQueryYou need to save the received data in JSON file and load JSON file to BigQuery table.
import requests import json from google.cloud import bigquery url = "https://api.exchangeratesapi.io/latest?symbols=USD,GBP" response = requests.get(url) data = response.text parsed = json.loads(data) for key, value in parsed.items():        if type(value) is dict:            with open('F:Pythondata.json', 'w') as f:                json.dump(value, f) client = bigquery.Client(project="analytics-and-presentation") filename = 'F:Pythondata.json' dataset_id = ‘my_dayaset’' table_id = 'currency_rate_details' dataset_ref = client.dataset(dataset_id) table_ref = dataset_ref.table(table_id) job_config = bigquery.LoadJobConfig() job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON job_config.autodetect = True with open(filename, "rb") as source_file:    job = client.load_table_from_file(source_file, table_ref, job_config=job_config) job.result()  # Waits for table load to complete. print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id)) 
Get Started with Hevo for free

Limitations of writing custom Scripts and developing ETL to load data from API to BigQuery

  1. The above code is written based on the current source as well as target destination schema. If either the data coming in from the source or the schema on BigQuery changes, ETL process will break.
  2. In case you need to clean your data from API – say transform time zones, hide personally identifiable information and so on, the current method does not support it. You will need to build another set of processes to accommodate that. Clearly, this would also need you to invest extra effort and money.
  3. You are a serious risk of data loss if at any point your system breaks. This could you be anything from source/destination not being reachable, script breaks and more. You would need to invest upfront in building systems and processes that capture all the fail points and consistently move your data to the destination.
  4. Since python is interpreted language it might cause performance issue to extract from API and load data into BigQuery. 
  5. For many APIs, we would need to supply credentials to access API. It is a very poor practice to pass credentials as a plain text in python script. You will need to take additional steps to ensure your pipeline is secure. 

An Easy way to Load Data from API to BigQuery:

Using a simple-to-use, secure Data Integration Platform like Hevoyou can move data from API to BigQuery in a matter of few minutes – that too, without writing any code.

Sign up here for a 14-day Free Trial!

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.

Here are the steps to move data from API to BigQuery using Hevo:

  • Connect your API endpoint by providing the necessary details.

Connect API End Point on Hevo

  • Connect and configure your BigQuery Warehouse where your data needs to be moved.

Select BigQuery Destination on Hevo

Yes, that is all. Hevo will do all the heavy lifting to ensure that your analysis-ready data is moved to BigQuery, in a secure, efficient, and reliable manner. 

Conclusion

From this blog, you would have understood the process need to follow to move data from API to BigQuery. This blog also highlights various methods, their shortcomings. Using these two methods you can move data from API to BigQuery. Using Hevo, you can save a lot of your time!

In addition to this, Hevo can natively connect to other 100+ data sources and bring them to your warehouse seamlessly. 

Visit our Website to Explore Hevo

Hevo offers a faster way to move data from Databases or SaaS applications into your Data Warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code. You can try Hevo for free by signing up for a 14-day free trial. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs!

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

No-code Data Pipeline for BigQuery