API to BigQuery: Real-time Data Loading Using Python

on Tutorial • July 19th, 2019 • Write for Hevo

Many businesses today use a variety of cloud-based application for day to day business. This could include:

  1. CRMs like Salesforce, HubSpot, etc.
  2. Marketing Application like Mailchimp, Sendgrid, etc.
  3. Support Systems like Intercom, Zendesk, etc. 

Companies today are also very keen to combine this data with data from various other sources to measure key metrics that help them grow. Given most of the cloud applications are own and run by third-party vendors – the applications expose their APIs in order to help companies extract the data into a data warehouse – say Google BigQuery. This blog details out the process you would need to follow to move data from API to BigQuery. The blog highlights various methods, their shortcomings along with the workarounds. Let’s get started.

What is an API?

API (Application Programming Interface) is a service that two applications use to communicate with each other. 

For simplicity, assume API to be like the Menu Card in a restaurant. You look at the menu, list down and communicate the items you want to order. Eventually, you will have your dish delivered to your table. 

Similarly, APIs allow engineers to see the various services/calls that can be done, communicate what they want and in response, API will fetch the relevant result for them. 

You can read more on APIs here.

When a call is made to an API, often it returns the data in JSON or XML format.

Some APIs require authentication (API Key or a Client ID and Client Secret, similar to a username and password, so to speak) to control their usage, and some do not. 

What is Google BigQuery?

Fully managed by Google, BigQuery is a cloud data warehouse that is known to store and analyze huge amounts of data in a matter of seconds.

It is cost-effective, highly and easily scalable, serverless and pretty much works out of the box. This makes it one of the favourite warehouse choices for modern businesses. 

You can read more about BigQuery features here.

Moving Data from API To Google BigQuery

You can use any of the following approaches to move data form API to BigQuery.

Method 1: A code-free Data Integration platform like Hevo Data will help you load data through a visual interface in real-time. You can sign up for a 14-day free trial here to explore this.

Method 2: Hand code ETL scripts and schedule cron jobs to move data from API to Google BigQuery.

Moving Data from API To Google BigQuery

Method 2: Hand code ETL scripts and schedule cron jobs to move data from API to Google BigQuery.

Moving Data from API To Google BigQuery

Get Started For Free

This post talks about Method 2 in detail. You will also find the limitations and challenges that you may face in this approach. Towards the end of the post, we have compared the two methods so that you have the right data points to choose a path that suits your needs the best. 

API to BigQuery ETL Using Custom Code:

The steps involved in migrating data from API to BigQuery are as follows:

  1. Getting your data out of your application using API
  2. Preparing the Data that was extracted from Application 
  3. Loading data into Google BigQuery

Step 1: Getting data out of your application using API

Below are the steps to extract data from the application using API.

Get the API URL from where you need to extract the data. In this article, you will learn how to use Python to extract data from ExchangeRatesAPI.io which is a free service for current and historical foreign exchange rates published by the European Central Bank. The same method should broadly work for any API that you would want to use.API URL = https://api.exchangeratesapi.io/latest?symbols=USD,GBPIf you click on URL you will get below result:

{      "rates":{         "USD":1.1215,       "GBP":0.9034    },    "base":"EUR",    "date":"2019-07-17" }

Reading and Parsing API response in python:

a. To handle API response will need two important libraries

import requests import json 

b. Connect to URL and get the response

url = "https://api.exchangeratesapi.io/latest?symbols=USD,GBP" response = requests.get(url) 

c. Convert string to JSON format

parsed = json.loads(data)

d. Extract data and print

date = parsed["date"] gbp_rate = parsed["rates"]["GBP"] usd_rate = parsed["rates"]["USD"]

Here is the complete code:

import requests

import json

url = "https://api.exchangeratesapi.io/latest?symbols=USD,GBP"

response = requests.get(url)

data = response.text

parsed = json.loads(data)

date = parsed["date"]

gbp_rate = parsed["rates"]["GBP"]

usd_rate = parsed["rates"]["USD"]

print("On " + date + " EUR equals " + str(gbp_rate) + " GBP")

print("On " + date + " EUR equals " + str(usd_rate) + " USD")

Step 2: Preparing Data received from API

There are two ways to load data to BigQuery. 

  1. You can save the received JSON formated data on JSON file and then load into BigQuery.
  2. You can parse the JSON object, convert JSON to dictionary object and then load into BigQuery.

Step 3: Loading data into Google BigQuery

We can load data into BigQuery directly using API call or can create CSV file and then load into BigQuery table.

  • Create a Python script to extract data from API URL and load (UPSERT mode)  into BigQuery table. Here UPSERT is nothing but Update and Insert operations. This means – if the target table has matching keys then update data, else insert a new record.
import requests 
import json from google.cloud
import bigquery url = "https://api.exchangeratesapi.io/latest?symbols=USD,GBP" 
response = requests.get(url) 
data = response.text 
parsed = json.loads(data) 
base = parsed["base"] 
date = parsed["date"] 
client = bigquery.Client() 
dataset_id = 'my_dataset' 
table_id = 'currency_details' 
table_ref = client.dataset(dataset_id).table(table_id) 
table = client.get_table(table_ref) for key, value in parsed.items():        if type(value) is dict:            for currency, rate in value.items():                QUERY = ('SELECT target_currency FROM my_dataset.currency_details where currency=%', currency)                query_job = client.query(QUERY)                               if query_job == 0:                    QUERY = ('update my_dataset.currency_details set  rate = % where currency=%',rate, currency)                    query_job = client.query(QUERY)                else:                    rows_to_insert = [                    (base, currency, 1, rate)                    ]                    errors = client.insert_rows(table, rows_to_insert) assert errors == []
  • Load JSON file to BigQueryYou need to save the received data in JSON file and load JSON file to BigQuery table.
import requests import json from google.cloud import bigquery url = "https://api.exchangeratesapi.io/latest?symbols=USD,GBP" response = requests.get(url) data = response.text parsed = json.loads(data) for key, value in parsed.items():        if type(value) is dict:            with open('F:Pythondata.json', 'w') as f:                json.dump(value, f) client = bigquery.Client(project="analytics-and-presentation") filename = 'F:Pythondata.json' dataset_id = ‘my_dayaset’' table_id = 'currency_rate_details' dataset_ref = client.dataset(dataset_id) table_ref = dataset_ref.table(table_id) job_config = bigquery.LoadJobConfig() job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON job_config.autodetect = True with open(filename, "rb") as source_file:    job = client.load_table_from_file(source_file, table_ref, job_config=job_config) job.result()  # Waits for table load to complete. print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id)) 

Limitations of writing custom Scripts and developing ETL to load data from API to BigQuery:

  1. The above code is written based on the current source as well as target destination schema. If either the data coming in from the source or the schema on BigQuery changes, ETL process will break.
  2. In case you need to clean your data from API – say transform time zones, hide personally identifiable information and so on, the current method does not support it. You will need to build another set of processes to accommodate that. Clearly, this would also need you to invest extra effort and money.
  3. You are a serious risk of data loss if at any point your system breaks. This could you be anything from source/destination not being reachable, script breaks and more. You would need to invest upfront in building systems and processes that capture all the fail points and consistently move your data to the destination.
  4. Since python is interpreted language it might cause performance issue to extract from API and load data into BigQuery. 
  5. For many APIs, we would need to supply credentials to access API. It is a very poor practice to pass credentials as a plain text in python script. You will need to take additional steps to ensure your pipeline is secure. 

An Easy way to Load Data from API to BigQuery:

Using a simple-to-use, secure Data Integration Platform like Hevoyou can move data from API to BigQuery in a matter of few minutes – that too, without writing any code.

Here are the steps to move data from API to BigQuery using Hevo:

  • Connect your API endpoint by providing necessary details.

Connect API End Point on Hevo

  • Connect and configure your BigQuery Warehouse where your data needs to be moved.

Select BigQuery Destination on Hevo

Yes, that is all. Hevo will do all the heavy lifting to ensure that your analysis-ready data is moved to BigQuery, in a secure, efficient and reliable manner. 

In addition to this, Hevo can natively connect to 100s of other data sources and bring them to your warehouse seamlessly. 

Sign up for a 14-day free trial here to experience super simple data integration to BigQuery. 

No-code Data Pipeline for BigQuery