Heroku is a cloud Platform as a Service (PaaS) that allows developers to run, build, and operate applications while offering support for various programming languages such as Node.js, Scala, Ruby, Python, Go, etc. Heroku PostgreSQL offers an advanced open source database as a secure, trusted, and scalable service that can be optimized for developers. To carry out analysis on the data that you pull from this database, you’ll have to move it to a fully-managed data warehouse that allows you to extract valuable, actionable insights from it. This is where Google BigQuery comes in. It is a highly scalable Data Warehouse that houses a built-in query engine.
In this article, you’ll go through two methods to seamlessly move data from Heroku for PostgreSQL to BigQuery: using custom code and a no-code Data Pipeline solution, Hevo.
Heroku for PostgreSQL to BigQuery Integration Methods
Here are the two methods you can implement for Heroku for Postgresql Bigquery migration:
Method 1: Using Hevo as a Heroku for PostgreSQL to BigQuery Connector
Hevo is a fully-managed, Automated No-code Data Pipeline that can load data from 150+ Sources(including 40+ free sources) such as Heroku for PostgreSQL to BigQuery.
SIGN UP HERE FOR A 14-DAY FREE TRIAL
Hevo can also enrich and transform the data into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
Using Hevo, Heroku for PostgreSQL to BigQuery Migration can be done in the following 2 steps:
Configure Heroku for PostgreSQL as a Source
- First, you need to log in to your Heroku account.
- Next, you need to choose the app containing the PostgreSQL database and open the databases dashboard.
- You can access the DATA tab and click on the PostgreSQL database you wish to use.
- Next, click on Settings > View Credentials.
- You can leverage the credentials provided below while setting up your PostgreSQL source in Hevo:
- Next, in the Configure your Heroku PostgreSQL Source page, you need to mention the following:
- Database Host: The Heroku PostgreSQL host’s DNS or IP address.
- Pipeline Name: This depicts a unique name for your Pipeline.
- Database User: The read-only user who has the permissions to read tables in your database.
- Database Port: This refers to the port on which your PostgreSQL server is listening for connections. The default value is 5432.
- Database Password: This refers to the password for the read-only user.
- Database Name: The database that you want to replicate.
- Select an Ingestion Mode: This refers to the desired mode by which you want to ingest data from the source. The available ingestion modes are Table, Logical Replication, and Custom SQL.
- For Ingestion Mode as Logical Replication, you need to follow the steps provided in each PostgreSQL variant document to set up logical replication.
- For Ingestion Mode as Table, refer to the section, Object Settings for steps to configure the objects that you want to replicate.
- Connection Settings:
- Use SSL: You can enable it to leverage an SSL encrypted connection. You can also enable this if you’re using Heroku PostgreSQL databases. To enable this, you need to mention the following:
- Client Certificate: This refers to the client public key certificate file.
- CA File: The file containing the SSL server certificate authority (CA).
- Client Key: The client private key file.
- Connect through SSH: You can enable this option to connect Hevo using an SSH tunnel, as opposed to directly connecting your PostgreSQL database host to Hevo. This lends an additional layer of security to your database by not exposing your PostgreSQL setup to the public. If this option is disabled, you need to whitelist Hevo’s IP address.
- Click TEST & CONTINUE to proceed with setting up the Destination.
Object and Query Mode Settings
Once you have specified the Source connection settings in Step 2 above, do one of the following:
- For Pipelines configured with the Table mode:
- In the Select Objects page, select the objects you want to replicate and click CONTINUE.
- Note: Each object represents a table in your database.
- In the Configure Objects page, specify the query mode you want to use for each selected object.
Note: In Full Load mode, Hevo attempts to replicate the full table in a single run of the Pipeline, with an ingestion limit of 25 Million rows.
- For Pipelines configured with the XMIN mode:
- In the Select Objects page, select the objects you want to replicate.
- For the selected objects, only new and updated records are ingested using the XMIN column.
- The Edit Config option is unavailable for the objects selected for XMIN-based ingestion. You cannot change the ingestion mode for these objects post-Pipeline creation.
- Click CONTINUE
Note: Each object represents a table in your database.
- For Pipelines configured with the Custom SQL mode:
- In the Provide Query Settings page, enter the custom SQL query to fetch data from the Source.
- In the Query Mode drop-down, select the query mode, and click CONTINUE.
You can check the official documentation for more details.
Configure BigQuery as a Destination
To set up Google BigQuery as a destination in Hevo, follow these steps:
- Step 1: In the Asset Palette, select DESTINATIONS.
- Step 2: In the Destinations List View, click + CREATE.
- Step 3: Select Google BigQuery from the Add Destination page.
- Step 4: Choose the BigQuery connection authentication method on the Configure your Google BigQuery Account page.
- Step 5: Choose one of these:
- Using a Service Account to connect:
- Service Account Key file, please attach.
- Note that Hevo only accepts key files in JSON format.
- Go to CONFIGURE GOOGLE BIGQUERY ACCOUNT and click it.
- Using a user account to connect:
- To add a Google BigQuery account, click +.
- Become a user with BigQuery Admin and Storage Admin permissions by logging in.
- To grant Hevo access to your data, click Allow.
- Step 6: Set the following parameters on the Configure your Google BigQuery page:
- Destination Name: A unique name for your Destination.
- Project ID: The BigQuery Project ID that you were able to retrieve in Step 2 above and for which you had permitted the previous steps.
- Dataset ID: Name of the dataset that you want to sync your data to, as retrieved in Step 3 above.
- GCS Bucket: To upload files to BigQuery, they must first be staged in the cloud storage bucket that was retrieved in Step 4 above.
- Step 7: Click Test Connection to test connectivity with the Amazon Redshift warehouse.
- Step 8: Once the test is successful, click SAVE DESTINATION to complete the Heroku for PostgreSQL to BigQuery integration.
Using manual scripts and custom code to move data into the warehouse is cumbersome. Frequent breakages, pipeline errors, and lack of data flow monitoring make scaling such a system a nightmare. Hevo’s reliable data pipeline platform enables you to set up zero-code and zero-maintenance data pipelines that just work.
Get started for Free with Hevo!
- Monitoring and Observability: Monitor pipeline health with intuitive dashboards that reveal every stat of pipeline and data flow. Bring real-time visibility into your ELT with Alerts and Activity Logs.
- Stay in Total Control: When automation isn’t enough, Hevo offers flexibility – data ingestion modes, ingestion, and load frequency, JSON parsing, destination workbench, custom schema management, and much more – for you to have total control.
- Auto-Schema Management: Correcting improper schema after the data is loaded into your warehouse is challenging. Hevo automatically maps source schema with destination warehouse so that you don’t face the pain of schema errors.
- Transparent Pricing: Say goodbye to complex and hidden pricing models. Hevo’s Transparent Pricing brings complete visibility to your ELT spend. Choose a plan based on your business needs. Stay in control with spend alerts and configurable credit limits for unforeseen spikes in the data flow.
Method 2: Using Custom Code for Heroku for PostgreSQL to BigQuery Migration
In this Heroku for PostgreSQL to BigQuery integration method, you’ll first move data from Heroku for PostgreSQL to Amazon Redshift, and then from Redshift to BigQuery.
Moving Data from Heroku for PostgreSQL to Amazon Redshift
You can move data from Heroku for PostgreSQL to Amazon Redshift in two ways: by using SSL validation or without using SSL validation. If you want to use SSL validation to establish the Heroku for PostgreSQL to Amazon Redshift connection, you’ll have to first create a Heroku database. Next, you’ll have to add the Amazon RDS SSL certificate you wish to use for your application. Once you’ve dumped and loaded it into Amazon RDS, you’ll have to test the connection.
For more details on the connection, you can refer to the Heroku for PostgreSQL to Redshift article.
Moving Data from Redshift to BigQuery
You’ll be leveraging the BigQuery Transfer Service to copy your data from an Amazon Redshift Data Warehouse to Google BigQuery. BigQuery Transfer Service engages migration agents in GKE and triggers an unload operation from Amazon Redshift to a staging area in an Amazon S3 bucket. Your data would then be moved from the Amazon S3 bucket to BigQuery.
Here are the steps involved in the same:
- Step 1: Go to the BigQuery page in your Google Cloud Console.
- Step 2: Click on Transfers. On the New Transfer Page you’ll have to make the following choices:
- For Source, you can pick Migration: Amazon Redshift.
- Next, for the Display name, you’ll have to enter a name for the transfer. The display name could be any value that allows you to easily identify the transfer if you have to change the transfer later.
- Finally, for the destination dataset, you’ll have to pick the appropriate dataset.
- Step 3: Next, in Data Source Details, you’ll have to mention specific details for your Amazon Redshift transfer as given below:
- For the JDBC Connection URL for Amazon Redshift, you’ll have to give the JDBC URL to access the cluster.
- Next, you’ll have to enter the username for the Amazon Redshift database you want to migrate.
- You’ll also have to provide the database password.
- For the Secret Access Key and Access Key ID, you need to enter the key pair you got from ‘Grant Access to your S3 Bucket’.
- For Amazon S3 URI, you need to enter the URI of the S3 Bucket you’ll leverage as a staging area.
- Under Amazon Redshift Schema, you can enter the schema you want to migrate.
- For Table Name Patterns, you can either specify a pattern or name for matching the table names in the Schema. You can leverage regular expressions to specify the pattern in the following form: <table1Regex>;<table2Regex>. The pattern needs to follow Java regular expression syntax.
- Step 4: Click on Save.
- Step 5: Google Cloud Console will depict all the transfer setup details, including a Resource name for this transfer. This is what the final result of the Heroku for PostgreSQL to BigQuery export looks like:
What is the Importance of Heroku for PostgreSQL BigQuery Integration?
The benefit of using Heroku for Postgresql Bigquery connector for integration is two-fold: it allows you to further query offline, and easier analysis for data-driven decision making. As a fully managed service, Google BigQuery allows you to extract the most out of data without the admin overhead.
Database forks and follower instances from Heroku for PostgreSQL can turn your data into an agile resource. This can be used for safe experimentation with various use cases. Utilizing Heroku Postgres ETL tools, developers can efficiently migrate and synchronize data across different platforms. Deploying on Heroku Postgres simplifies database management tasks for developers.
With Heroku for PostgreSQL, new data strategies, instances dedicated to analytics, development instances, along with data warehousing are ready to go in just a few clicks.
This blog talks about the different methods you can implement to integrate Heroku for PostgreSQL to BigQuery seamlessly: using custom scripts and a no-code Data Pipeline solution, Hevo.
Visit our Website to Explore Hevo
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!
Comment below to tell us about your experience of connecting Heroku for PostgreSQL to BigQuery!