HubSpot to BigQuery: Move Data Instantly

on Tutorial • May 7th, 2020 • Write for Hevo

Do you have enormous amounts of customer and marketing data in HubSpot? Are you looking to bring this data onto a data warehouse such as BigQuery so that you leverage the power of SQL to analyze this data better? If yes, then you have landed on the right blog. 

This post talks about steps to move data from HubSpot to BigQuery. 

Understanding HubSpot

HubSpot’s Software-as-a-Service has a wide range of offerings that allow a business to modernize the process of attracting visitors, converting leads, and closing deals. Here are some of HubSpot’s features:

  1. Marketing Hub – you can use it to attract more of the right people by creating and sharing useful content, writing tailored AD campaigns, and engaging with leads through personalized messages. 
  2. Sales Hub – teaches your sales team everything they need to sell in more personalized and relevant ways that are better and more efficient – not just for your sales reps, but your customers.
  3. Service Hub – helps you engage with your customers, guide them to solutions and turn them into promoters who help grow your company even more. This is because today, word of mouth is more important than ever. If your customers are happy they stay longer and refer their friends, but if not, they leave and tell the world. This, therefore, means that if you want to grow you need to invest in creating a great experience. 

HubSpot brings all these features together into a single shared view of all your customers. It empowers salespeople with the tools they need to generate leads, acquire, engage, and retain customers online.

Understnding Google BigQuery

Google BigQuery is a powerful cloud data warehouse platform that enables fast SQL queries against append-only tables using the processing power of Google’s infrastructure. By using Google BigQuery to store and analyze data points, businesses can now work more productively to enhance their customer engagement. When you integrate your HubSpot CRM data into BigQuery, immediately you’re able to tap into a rich set of big data analytics tools that can help you to unearth crucial lead and business insights.

Why migrate your HubSpot Account Data to BigQuery?

There are broadly two reasons to move data from your HubSpot account to Google BigQuery.

  1. Data Backup: You store valuable data on your customers on HubSpot. There is a possibility that you lose access to your data due to human error, canceled account subscriptions, and so on. It is therefore essential that you backup your CRM data in a Cloud Data Warehouse.
  2. Data Analytics: When you backup your HubSpot data to Google BigQuery tables, you get an infinitely scalable data store that includes cutting-edge data analytics tools. Using BigQuery, you can run custom SQL queries on your contact and campaign data and generate custom reports and dashboards. You can even combine this data with data from other data sources used by your business to make it even more valuable.

Note, in HubSpot, you can pretty much export everything:

  • Sales data
  • Tasks
  • Calls-to-Action (CTA) analytics
  • Contact lists
  • CRM data
  • Marketing emails
  • Customer feedback
  • Form submission data
  • Notes about each contact, calls, meetings
  • Activity data (clicks, opens, views, URL redirects, etc.)

Moving Data from HubSpot to Google BigQuery:

To export data from HubSpot to BigQuery, you have 2 options:

  1. Build Custom Scripts: This would need you to export HubSpot’s data using their REST API, and then load to BigQuery using bq command-line tool. This would need developer involvement.
  2. Using Hevo Automated Data Pipelines: This is a much easier option as it allows you to get the same job done in minutes, without having to write any code.

This blog post seeks to walk through the process of extracting data from HubSpot and exporting it to Google BigQuery using HubSpot’s REST API. You will probably take up more time reading this post than if you had used Hevo’s GUI to perform a Hubspot to BigQuery ETL. Therefore, if this manual process appears to be demanding, check out Hevo, which will do all the hard work for you in a simple intuitive process so that you can focus on what matters, the analysis of your data. 

Prerequisites:

  1. A HubSpot account with admin access.
  2. A Google Cloud Platform account with billing enabled.
  3. Google Cloud SDK installed on your CLI.
  4. Configured the Google Cloud SDK to point to your Google Cloud project.
  5. Enabled the BigQuery API on your Google Cloud project.
  6. bigquery.admin permissions which gives you the ability to load data into tables in your BigQuery dataset.

HubSpot to BigQuery: Step-by-Step

1. Obtain your HubSpot API key

  • Navigate to the main navigation bar in your HubSpot account and click the Settings icon.
  • In the left sidebar menu, go to Integrations > API key.
  • Click Generate API key.
  • If you had previously generated an API key, click Show to display your API key.

  • Record this key.

2. Export HubSpot data to JSON

API calls to HubSpot should be made to the https://api.hubapi.com base domain. JSON will be returned in all responses.

Method to get all contacts that have been created in your HubSpot account

curl --request GET 
  --url 'https://api.hubapi.com/contacts/v1/lists/all/contacts/all?hapikey=YOUR_HUBSPOT_API_KEY' -o contacts_data.json

Response:

{
  "lists": [
    {
      "dynamic": true,
      "metaData": {
        "processing": "DONE",
        "size": 161,
        "error": "",
        "lastProcessingStateChangeAt": 1284705947989,
        "lastSizeChangeAt": 1579479579246
      },
      "name": "test emails",
      "filters": [
        [
          {
            "filterFamily": "PropertyValue",
            "withinTimeMode": "PAST",
            "checkPastVersions": false,
            "type": "string",
            "property": "email",
            "value": "test",
            "operator": "STR_STARTS_WITH"
          }
        ]
      ],
      "portalId": 55748,
      "createdAt": 1396370538484,
      "listId": 1,
      "updatedAt": 1492585836314,
      "listType": "DYNAMIC",
      "internalListId": 330881,
      "deleteable": true
    },
    {
      "dynamic": false,
      "metaData": {
        "processing": "DONE",
        "size": 2,
        "error": "",
        "lastProcessingStateChangeAt": 1175279425836,
        "lastSizeChangeAt": 1196347964521
      },
      "name": "staticTest",
      "filters": [
        
      ],
      "portalId": 77123,
      "createdAt": 1253682414252,
      "listId": 3,
      "updatedAt": 1358942467316,
      "listType": "STATIC",
      "internalListId": 5
    }
  ],
  "offset": 2,
  "has-more": true
}

Here are some of the other API calls you might use:

  • Analytics API
  • Automation API
  • Companies API
  • CMS API
  • CRM API
  • Events API
  • Email Events API
  • Engagements API
  • Marketing API
  • Products API
  • Social Media API
  • etc.

3. Create a dataset and a BigQuery table

From your gcloud command-line, run this command:

bq mk hubspot_dataset

4. Create an empty table in your hubspot_dataset.

bq mk 
--table 
--expiration 86400 
--description "Contacts table" 
--label organization:development 
hubspot_dataset.contacts_table

The BigQuery API allows you to upload binary data using the following formats: AVRO, CSV, PARQUET, ORC, or NEWLINE_DELIMITED_JSON5. Loading data into your BigQuery table from your local machine

The following command loads a newline-delimited JSON file contacts_data.json from your local machine into a table named contacts_table in the hubspot_dataset. You need to define the contacts schema in a local schema file named contacts_schema.json. Notice that since you’re loading files from your local machine, you need to specify the data format explicitly.

bq load 
--source_format=NEWLINE_DELIMITED_JSON 
hubspot_dataset.contacts_table 
./contacts_data.json 
./contacts_schema.json

Normally you will want to backup your HubSpot data at least daily or perhaps multiple times per day. You can load additional data into a BigQuery table either from source files in your local machine by running the following command.

6. Appending to a table using a local file

  bq load 
    --autodetect 
    --replace 
    --source_format=NEWLINE_DELIMITED_JSON 
    hubspot_dataset.contacts_table 
    ./contacts_data.json

This command loads data from a file named contacts_data.json and appends data to a table named contacts_table in hubspot_dataset. The schema is defined using a JSON schema file — contacts_schema.json.

7. Scheduling recurring load jobs

  • Create your bash shell script.
     $ sudo mkdir /bin/scripts/ && touch /bin/scripts/backup.sh
    Add the following content to the file and save it.
    #!/bin/bash     bq load      --autodetect      --replace      --source_format=NEWLINE_DELIMITED_JSON      hubspot_dataset.contacts_table      ./contacts_ data.json
  • From your CLI, edit crontab.
     $ crontab -e
  • You will be prompted to edit a file to introduce tasks to be run by cron. Schedule Cron to Execute a Job at 6 pm Daily.
    0 18 * * * /bin/scripts/backup.sh 
  • Navigate to a directory where your backup.sh file is located and make the file executable.
    $ chmod +x /bin/scripts/backup.sh 
  • The above crontab commands will make cron execute the backup.sh file 6 PM daily.

Limitations of this approach:

  1. HubSpot’s APIs are subject to rate limiting. These limits are applied per account for custom integrations using API keys.
  2. Wildcards are not supported when you load files using this method. Each file must be loaded individually and this often becomes very inefficient and resource-intensive.
  3. CronJobs can’t alert you if your load jobs fail or never start.
  4. Pulling data from HubSpot’s public endpoints using API keys is subject to daily limits.
  5. You need to specify separate schemas for data extracted from each API endpoint on BigQuery.
  6. This method does not allow you to have access to data in real-time. If your use case demands you to bring data in real-time, then this method is not suitable.
  7. If you need to clean, transform, and enrich your data before loading it to the data warehouse, then you would need to write additional code to achieve that. 

HubSpot to Google BigQuery: Exploring a Simpler Alternative – Hevo Data

Hevo, is an automated data pipeline that provides an easy to use user interface with the ability to copy data from HubSpot to BigQuery without writing any code. Hevo enables the lowest time to production for such copy operations, allowing developers to focus on their core business logic rather than waste time on the configuration nightmares involved in setting these up.

Hevo can help you load data in 2 easy steps:

  1. Authenticate and connect your HubSpot data source
  2. Connect the BigQuery data warehouse to load data

That is it. Hevo takes care of all the heavy-weight lifting to ensure that the data from HubSpot is loaded to your PostgreSQL data warehouse, in a hassle-free fashion, in real-time. 

Here are more reasons to try Hevo:

  1. Simplicity – Hevo has a point and click interface that is super intuitive to use. With Hevo, you can start loading data from HubSpot to BigQuery in just a few clicks, without any developer assistance.
  2. Real-time Data – Hevo’s real-time streaming architecture ensures that you move data from HubSpot to BigQuery instantly, without any delay. This allows you to gain meaningful insights in your CRM and Marketing data in real-time.
  3. Reliable and Consistent Data Load – Hevo’s fault-tolerant architecture ensures that your data is loaded in a consistent and reliable fashion without any data loss.
  4. Scalability – Hevo is built to handle data of any scale. Additionally, Hevo can bring data from a wide range of data sources (sales and marketing applications, analytics applications, databases and more). Both these aspects ensure that Hevo can help you scale your data infrastructure as your data needs grow.

Sign up for a 14-day free trial with Hevo to start moving data from HubSpot to BigQuery instantly.

No-code Data Pipeline for BigQuery