Are you trying to connect HubSpot to Redshift? Do you want to consolidate your HubSpot and Redshift data for better analysis? If yes, then you are in the right place. Cloud-based software as a service offerings spare organizations from investing time and effort in building custom software solutions and enables them to focus on their core business. A challenge with using such services is that they limit the organization’s ability to carry out analysis over the data generated. In most cases, organizations will have their own data warehouse which needs to be kept up to date with the customer data present in these SaaS offerings.

Having all the data in one place will help organizations to extract the true capabilities of the Data Warehouse and derive meaningful insights from their data. Hubspot is a Software-as-a-service(SaaS) offering that provides a full suite of software for marketing, sales, and customer service. This post is about the methods to load the data from HubSpot to Redshit, one of the most popular Data Warehouse services.

Introduction to HubSpot

HubSpot to Redshift: HubSpot
Image Source: Google

HubSpot is an American company that develops software products for inbound marketing, customer service, and sales. It will help your company to attract visitors, leads, and customers. They also provide various tools for hosting web and landing pages, creating blogs and emails, managing interaction with leads and customers, and analyzing the campaigns. HubSpot features such as drag and drop, analysis, speed, etc make it a must-have tool.

Introduction to Amazon Redshift

HubSpot to Redshift: Redshift
Image Source: Google

Amazon Redshift is a cloud-based fully managed data warehouse. It can be used for large data set storage and migration as it is based on Massive Parallel Processing (MPP). Redshift’s column-oriented database can connect to Business Intelligence (BI) tools and SQL-based clients. All your data is available in real-time. Redshift’s parallel processing and compressions reduce the execution time and deliver faster results. All these features are available at a reasonable price.

Reliably integrate data with Hevo’s Fully Automated No Code Data Pipeline

Reliably integrate data with Hevo’s Fully Automated No Code Data Pipeline

If yours anything like the 1000+ data-driven companies that use Hevo, more than 70% of the business apps you use are SaaS applications Integrating the data from these sources in a timely way is crucial to fuel analytics and the decisions that are taken from it. But given how fast API endpoints etc can change, creating and managing these pipelines can be a soul-sucking exercise.

Hevo’s no-code data pipeline platform lets you connect over 150+ sources in a matter of minutes to deliver data in near real-time to your warehouse. What’s more, the in-built transformation capabilities and the intuitive UI means even non-engineers can set up pipelines and achieve analytics-ready data in minutes. 

All of this combined with transparent pricing and 24×7 support makes us the most loved data pipeline software in terms of user reviews.

Take our 14-day free trial to experience a better way to manage data pipelines.

Get started for Free with Hevo!

Methods to Move Data from HubSpot to Redshift

You can connect HubSpot to Redshift using the following methods:

Method 1: HubSpot to Redshift – Building Custom Code Using HubSpot APIs

Hubspot provides a rich set of APIs to retrieve the data present in customer accounts that can help you move your data from HubSpot to Redshift. It also provides an easy-to-follow authentication model to enable developers to build custom applications using their APIs. In this tutorial, we will use the Contacts API that gives us a list of all the contacts present in Hubspot. We will then try to decipher the data and load it into Redshift using Redshift’s COPY command. 

1. The first step is to go to the settings page in your Hubspot portal and create API keys. Please note this needs super admin permissions. In the settings panel, navigate to integrations > API key and click the Generate API key button. The keys have a lifetime of 6 months. 

Using the API key, make the following request using the CURL utility

curl -o contacts.json https://api.hubapi.com/contacts/v1/lists/all/contacts/all?hapikey=<api_key>&count=2

The above command will populate the contacts.json file. Before going into the steps, there are some specifics of the API that we need to be familiar with. The count property of the API allows developers to specify the number of entries to be retrieved at a time. The API at a time returns a maximum of 100 entries. If there are more than 100 contacts, it returns the data in a paginated manner. The API response has a parameter called vid-offset that can be used to retrieve the next set of entries after consuming the first page. The developer has to use the value of the vid-offset parameter in the next request with the vidOffset property.

2. Convert the JSON to a CSV using the jq command-line utility. For this tutorial, we will extract the vid, first name, company, and last name attribute of contacts to form a CSV. Use the below command to accomplish this.

cat contacts.json | jq -r ".contacts[] | [.vid,.properties.firstname.value, .properties.company.value,.properties.lastname.value] | @csv" >> contacts.csv

The output of the above command will be copied to a CSV file named “contacts”.

3. Copy the file to AWS S3.  Assuming AWS CLI is already installed and configured, use the below command.

aws s3 cp /contacts/contacts.csv s3://contacts_bucket/

4. We will now use Redshift’s COPY command to load data. COPY command can load data from an S3 bucket, a remote host, an EMR cluster, or DynamoDB. In this case, we will be copying data from AWS S3. Use the below command to execute this.

copy contacts from 's3://contacts_bucket/contacts.csv'  iam_role 'arn:aws:iam::0123456789012:role/User_RedshiftRole'  csv;

Where iam_role is the user’s IAM role. This command requires a contacts table to be already created with an id,first_name, company, and last_name fields.

That concludes the step in copying the data from contacts API to Redshift. In reality, when implementing for a production environment, there are a lot more challenges that need to be solved to create a repeatable process. Let’s have a look at the typical problems that should be addressed in such a custom application.

HubSpot to Redshift: Limitations of Using Custom Code

Some of the limitations of using custom code to connect HubSpot to Amazon Redshift are listed below:

  1. Hubspot has API rate limits to prevent its infrastructure from being overloaded. Every response has rate limit specification headers which specify the maximum number of requests for a day, the maximum number of requests per window, size of the window, etc. The application has to read these headers and make API calls appropriately.
  2. The API returns a paginated response. So the application needs to be intelligent enough to keep track of the vid-offset value through which Hubspot manages pagination to get the full data.
  3. The above approach is limited to be used for a bulk load of data only. Redshift does not enforce any kind of primary key constraints and hence repeated loading of data fetched from the API will result in duplicate rows. If continuous sync (or real-time sync) between Hubspot and Redshift is required, additional logic has to be built.
  4. Redshift’s COPY command has a built-in implicit data type conversion mechanism which can play havoc with your data if not carefully mapped. COPY command can take a JSON configuration file to map the data type correctly if the developer can specify them.

If all of the above proves to be too much work, an alternative elegant solution will be to use a cloud-based ETL tool like Hevo which can abstract away all these details and set up this data load operation in a matter of few clicks.

Method 2: HubSpot to Redshift using Hevo Data

Hevo Data, a no-code data pipeline
Image Source: Self

Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. It allows you to focus on key business needs and perform insightful analysis using various BI tools such as Power BI, Tableau, etc. 

Hevo focuses on two simple steps to move your data from HubSpot to Redshift:

  • Configure Source: Connect Hevo Data with HubSpot by providing a unique name for your Pipeline, along with details about your authorized Redshift account. You can also choose the historical sync duration for your HubSpot data.
Configuring HubSpot Source
Image Source: Self
  • Integrate Data: Complete HubSpot to Redshift migration by providing your destination name, account name, region of your account, database username and password, database and schema name, and the Data Warehouse name.
Configuring Amazon Redshift
Image Source: Self

Check Out What Makes Hevo Amazing:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.

Conclusion

In this blog, you learned about HubSpot, Redshift, and two different approaches to connect HubSpot to Redshift. You can either use HubSpot API to build custom scripts or use a fully automated data pipeline, Hevo to connect HubSpot to Redshift.

Visit our Website to Explore Hevo

Hevo is a No-code Data Pipeline and has awesome 100+ pre-built integrations that you can choose from. Hevo can help you integrate your data from numerous sources and load them into a destination to analyze real-time data with a BI tool such as Tableau. It helps transfer data from HubSpot to a destination of your choice for free. It will make your life easier and data migration hassle-free. It is user-friendly, reliable, and secure. Check out the pricing details here. Try Hevo by signing up for a 14-day free trial and see the difference! Experience the power of Hevo first hand. Watch this short overview video to get a sense of how Hevo works:

Simplify your data analysis with Hevo today! Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

What is your preferred approach to move data from HubSpot to Redshift? Do let us know in the comments section below. 

Talha
Software Developer, Hevo Data

Talha is a seasoned Software Developer, currently driving advancements in data integration at Hevo Data, where he have been instrumental in shaping a cutting-edge data integration platform for the past four years. With a significant tenure at Flipkart prior to their current role, he brought innovative solutions to the space of data connectivity and software development.

No-code Data Pipeline for Redshift