Heroku is a cloud Platform as a Service (PaaS) that allows developers to run, build, and operate applications while offering support for various programming languages such as Node.js, Scala, Ruby, Python, Go, etc. Heroku PostgreSQL offers an advanced open source database as a secure, trusted, and scalable service that can be optimized for developers. To carry out analysis on the data that you pull from this database, you’ll have to move it to a fully-managed data warehouse that allows you to extract valuable, actionable insights from it. This is where Google BigQuery comes in. It is a highly scalable Data Warehouse that houses a built-in query engine.
In this article, you’ll go through two methods to seamlessly move data from Heroku for PostgreSQL to BigQuery: using custom code and a no-code Data Pipeline solution, Hevo.
What is Heroku for PostgreSQL?
Image Source
As Heroku grows, there is a pressing need for database services that complement the features offered by the Heroku platform like usage on demand at a low cost and simple provisioning. This is how Heroku for PostgreSQL came into being. Heroku chose PostgreSQL over MySQL since Heroku considered PostgreSQL as more operationally reliable than MySQL, which is pivotal for managing hundreds of thousands of databases regularly.
Another key reason for choosing PostgreSQL was that it was and will continue to remain an open-source tool. This means, that as long as you are leveraging PostgreSQL as your database server, you’ll never be subject to any vendor lock-in. This will give you the freedom as a user to take your data wherever you please. Here are a few more reasons that make PostgreSQL a no-brainer for Heroku’s service:
- Concurrent Indexes: When you make an index with most traditional databases, it holds a lock on the table while the index is being created. This means that the table is more or less unusable during that time. This doesn’t pose a serious problem when you’re just starting, but as your data grows and you add indexes to boost performance, it could cause downtime just to add an index (not ideal within a production environment). PostgreSQL possesses an incredible way of adding an index without holding that lock. You just need to call CREATE INDEX CONCURRENTLY as opposed to calling CREATE INDEX. However, there’s a small catch that comes along with this. To create your index concurrently with this method, might take you two or three times as long, and it can’t be accomplished within a transaction.
- Transactional DDL: If you’ve ever modified your database and have had something fail at the halfway stage, either due to a constraint or something else, you know the cumbersome process that follows unraveling the resultant mess. Usually, changes to a schema are meant to be run comprehensively and if they fail, you’d want to roll back completely. PostgreSQL supports wrapping your DDL within a transaction. This means if an error occurs, you can simply roll back and have the previous DDL statements rolled back with it, ensuring the safety of your data and schema migrations. This also ensures that your application exists in a consistent state.
- Partial Indexing: Similar to making a change to only part of your data, you might be concerned with only an index or a portion of your data, or you might want to place a constraint only where a certain condition is true.
- Extensibility: With extensions, PostgreSQL lends supplementary offerings such as JSON data types, geospatial support, key/value stores, and connecting to external data sources (for instance, Redis, Oracle, and MySQL).
What is Google BigQuery?
Image Source
Google BigQuery is Google’s data warehousing solution. As a part of the Google Cloud Platform, it deals in SQL, similar to Amazon Redshift. Google BigQuery helps businesses pick the most appropriate software provider to assemble their data, based on the platform the business leverages.
You can easily interact with Google BigQuery through its web user interface, through a command-line tool. Google also provides various client libraries that you can choose from to interact with Google BigQuery through your application.
Google BigQuery uses a Columnar Storage format that is optimized for analytical queries to store data. BigQuery displays data in tables, rows, and columns, with full database transaction semantics support (ACID). To ensure high availability, BigQuery storage is automatically replicated across multiple locations.
Key Features of Google BigQuery
- Federated Query: Google BigQuery employs a novel method for sending a query statement to an external database and receiving the results as a temporary table if the user’s data is stored in Bigtable, GCS, or Google Drive.
- Flexible Scaling: You don’t have to explicitly tweak the cluster with BigQuery since computing resources are automatically adjusted according to the workload, and it can easily extend storage to Petabytes on demand. Patching, updates, computing, and storage resource scaling are all handled by Google BigQuery, making it a fully managed service.
- Programming Access: Google BigQuery may be easily accessed in applications using Rest API queries, Client Libraries such as Java, .Net, Python, the Command-Line Tool, or the GCP Console. It also includes query and database management tools.
- Storage: Google BigQuery leverages Google’s global storage system, Colossus, to store and optimize your data for free and with no downtime. To store data, Google BigQuery uses the opinionated Capacitor format in Colossus, which achieves various enhancements behind the scenes while burning a large amount of CPU/RAM, all without affecting query performance or imposing a bill limit.
What is the Importance of Heroku for PostgreSQL BigQuery Integration?
The benefit of using Heroku for Postgresql Bigquery connector for integration is two-fold: it allows you to further query offline, and easier analysis for data-driven decision making. As a fully managed service, Google BigQuery allows you to extract the most out of data without the admin overhead. Database forks and follower instances from Heroku for PostgreSQL can turn your data into an agile resource. This can be used for safe experimentation with various use cases.
With Heroku for PostgreSQL, new data strategies, instances dedicated to analytics, development instances, along with data warehousing are ready to go in just a few clicks.
Heroku for PostgreSQL to BigQuery Integration Methods
Here are the two methods you can implement for Heroku for Postgresql Bigquery migration:
Method 1: Using Hevo as a Heroku for PostgreSQL to BigQuery Connector
Image Source
Hevo is a fully-managed, Automated No-code Data Pipeline that can load data from 150+ Sources(including 40+ free sources) such as Heroku for PostgreSQL to BigQuery.
SIGN UP HERE FOR A 14-DAY FREE TRIAL
Hevo can also enrich and transform the data into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
Using Hevo, Heroku for PostgreSQL to BigQuery Migration can be done in the following 2 steps:
Configure Heroku for PostgreSQL as a Source
- First, you need to log in to your Heroku account.
- Next, you need to choose the app containing the PostgreSQL database and open the databases dashboard.
- You can access the DATA tab and click on the PostgreSQL database you wish to use.
- Next, click on Settings > View Credentials.
- You can leverage the credentials provided below while setting up your PostgreSQL source in Hevo:
- Next, in the Configure your Heroku PostgreSQL Source page, you need to mention the following:
- Database Host: The Heroku PostgreSQL host’s DNS or IP address.
- Pipeline Name: This depicts a unique name for your Pipeline.
- Database User: The read-only user who has the permissions to read tables in your database.
- Database Port: This refers to the port on which your PostgreSQL server is listening for connections. The default value is 5432.
- Database Password: This refers to the password for the read-only user.
- Database Name: The database that you want to replicate.
- Select an Ingestion Mode: This refers to the desired mode by which you want to ingest data from the source. The available ingestion modes are Table, Logical Replication, and Custom SQL.
- For Ingestion Mode as Logical Replication, you need to follow the steps provided in each PostgreSQL variant document to set up logical replication.
- For Ingestion Mode as Table, refer to the section, Object Settings for steps to configure the objects that you want to replicate.
- Connection Settings:
- Use SSL: You can enable it to leverage an SSL encrypted connection. You can also enable this if you’re using Heroku PostgreSQL databases. To enable this, you need to mention the following:
- Client Certificate: This refers to the client public key certificate file.
- CA File: The file containing the SSL server certificate authority (CA).
- Client Key: The client private key file.
- Connect through SSH: You can enable this option to connect Hevo using an SSH tunnel, as opposed to directly connecting your PostgreSQL database host to Hevo. This lends an additional layer of security to your database by not exposing your PostgreSQL setup to the public. If this option is disabled, you need to whitelist Hevo’s IP address.
- Click TEST & CONTINUE to proceed with setting up the Destination.
Configure BigQuery as a Destination
To set up Google BigQuery as a destination in Hevo, follow these steps:
- Step 1: In the Asset Palette, select DESTINATIONS.
- Step 2: In the Destinations List View, click + CREATE.
- Step 3: Select Google BigQuery from the Add Destination page.
- Step 4: Choose the BigQuery connection authentication method on the Configure your Google BigQuery Account page.
- Step 5: Choose one of these:
- Using a Service Account to connect:
- Service Account Key file, please attach.
- Note that Hevo only accepts key files in JSON format.
- Go to CONFIGURE GOOGLE BIGQUERY ACCOUNT and click it.
- Using a user account to connect:
- To add a Google BigQuery account, click +.
- Become a user with BigQuery Admin and Storage Admin permissions by logging in.
- To grant Hevo access to your data, click Allow.
- Step 6: Set the following parameters on the Configure your Google BigQuery page:
- Destination Name: A unique name for your Destination.
- Project ID: The BigQuery Project ID that you were able to retrieve in Step 2 above and for which you had permitted the previous steps.
- Dataset ID: Name of the dataset that you want to sync your data to, as retrieved in Step 3 above.
- GCS Bucket: To upload files to BigQuery, they must first be staged in the cloud storage bucket that was retrieved in Step 4 above.
- Enable Streaming Inserts: Enable this option to load data via a job according to a defined Pipeline schedule rather than streaming it to your BigQuery Destination as it comes in from the Source. To learn more, go to Near Real-time Data Loading Using Streaming. The setting cannot be changed later.
- Sanitize Table/Column Names: Activate this option to replace the spaces and non-alphanumeric characters in between the table and column names with underscores ( ). Name Sanitization is written.
- Step 7: Click Test Connection to test connectivity with the Amazon Redshift warehouse.
- Step 8: Once the test is successful, click SAVE DESTINATION to complete the Heroku for PostgreSQL to BigQuery integration.
Using manual scripts and custom code to move data into the warehouse is cumbersome. Frequent breakages, pipeline errors, and lack of data flow monitoring make scaling such a system a nightmare. Hevo’s reliable data pipeline platform enables you to set up zero-code and zero-maintenance data pipelines that just work.
- Reliability at Scale: With Hevo, you get a world-class fault-tolerant architecture that scales with zero data loss and low latency.
- Monitoring and Observability: Monitor pipeline health with intuitive dashboards that reveal every stat of pipeline and data flow. Bring real-time visibility into your ELT with Alerts and Activity Logs.
- Stay in Total Control: When automation isn’t enough, Hevo offers flexibility – data ingestion modes, ingestion, and load frequency, JSON parsing, destination workbench, custom schema management, and much more – for you to have total control.
- Auto-Schema Management: Correcting improper schema after the data is loaded into your warehouse is challenging. Hevo automatically maps source schema with destination warehouse so that you don’t face the pain of schema errors.
- 24×7 Customer Support: With Hevo you get more than just a platform, you get a partner for your pipelines. Discover peace with round-the-clock “Live Chat” within the platform. What’s more, you get 24×7 support even during the 14-day full-featured free trial.
- Transparent Pricing: Say goodbye to complex and hidden pricing models. Hevo’s Transparent Pricing brings complete visibility to your ELT spend. Choose a plan based on your business needs. Stay in control with spend alerts and configurable credit limits for unforeseen spikes in the data flow.
Get started for Free with Hevo!
Method 2: Using Custom Code for Heroku for PostgreSQL to BigQuery Migration
In this Heroku for PostgreSQL to BigQuery integration method, you’ll first move data from Heroku for PostgreSQL to Amazon Redshift, and then from Redshift to BigQuery.
Moving Data from Heroku for PostgreSQL to Amazon Redshift
You can move data from Heroku for PostgreSQL to Amazon Redshift in two ways: by using SSL validation or without using SSL validation. If you want to use SSL validation to establish the Heroku for PostgreSQL to Amazon Redshift connection, you’ll have to first create a Heroku database. Next, you’ll have to add the Amazon RDS SSL certificate you wish to use for your application. Once you’ve dumped and loaded it into Amazon RDS, you’ll have to test the connection.
For more details on the connection, you can refer to the Heroku for PostgreSQL to Redshift article.
Moving Data from Redshift to BigQuery
You’ll be leveraging the BigQuery Transfer Service to copy your data from an Amazon Redshift Data Warehouse to Google BigQuery. BigQuery Transfer Service engages migration agents in GKE and triggers an unload operation from Amazon Redshift to a staging area in an Amazon S3 bucket. Your data would then be moved from the Amazon S3 bucket to BigQuery.
Image Source
Here are the steps involved in the same:
- Step 1: Go to the BigQuery page in your Google Cloud Console.
- Step 2: Click on Transfers. On the New Transfer Page you’ll have to make the following choices:
- For Source, you can pick Migration: Amazon Redshift.
- Next, for the Display name, you’ll have to enter a name for the transfer. The display name could be any value that allows you to easily identify the transfer if you have to change the transfer later.
- Finally, for the destination dataset, you’ll have to pick the appropriate dataset.
Image Source
- Step 3: Next, in Data Source Details, you’ll have to mention specific details for your Amazon Redshift transfer as given below:
- For the JDBC Connection URL for Amazon Redshift, you’ll have to give the JDBC URL to access the cluster.
- Next, you’ll have to enter the username for the Amazon Redshift database you want to migrate.
- You’ll also have to provide the database password.
- For the Secret Access Key and Access Key ID, you need to enter the key pair you got from ‘Grant Access to your S3 Bucket’.
- For Amazon S3 URI, you need to enter the URI of the S3 Bucket you’ll leverage as a staging area.
- Under Amazon Redshift Schema, you can enter the schema you want to migrate.
- For Table Name Patterns, you can either specify a pattern or name for matching the table names in the Schema. You can leverage regular expressions to specify the pattern in the following form: <table1Regex>;<table2Regex>. The pattern needs to follow Java regular expression syntax.
Image Source
- Step 4: Click on Save.
- Step 5: Google Cloud Console will depict all the transfer setup details, including a Resource name for this transfer. This is what the final result of the Heroku for PostgreSQL to BigQuery export looks like:
Image Source
Conclusion
This blog talks about the different methods you can implement to integrate Heroku for PostgreSQL to BigQuery seamlessly: using custom scripts and a no-code Data Pipeline solution, Hevo. It also gives a brief introduction to the key features and benefits of Heroku for PostgreSQL and Google BigQuery before diving into Heroku for Postgresql to Bigquery integration methods.
Visit our Website to Explore Hevo
Hevo will automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc. Hevo provides a wide range of sources – 150+ Data Sources (including 40+ Free Sources) such as Heroku for PostgreSQL– that connect with over 15+ Destinations such as Google BigQuery. It will provide you with a seamless experience and make your work life much easier.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!