Are you looking to execute operational Database activities on your BigQuery data by transferring it to a PostgreSQL database? Well, you’ve come to the right place. Data replication from BigQuery to PostgreSQL is now much easier.

This article will provide a quick introduction to PostgreSQL and Google BigQuery. You’ll also learn how to set up your BigQuery to PostgreSQL integration using two different techniques. To figure out which way of connecting BigQuery to PostgreSQL is ideal for you, keep reading.

Introduction to PostgreSQL

PostgreSQL Logo, BigQuery to PostgreSQL | Hevo Data |
PostgreSQL Logo

PostgreSQL is a high-performance, enterprise-level open-source relational database that enables both SQL (relational) and JSON (non-relational) querying. It’s a very reliable database management system, with more than two decades of community work to thank for its high levels of resiliency, integrity, and accuracy. Many online, mobile, geospatial, and analytics applications utilize PostgreSQL as their primary data storage or data warehouse.

The source code for PostgreSQL is freely accessible under an open source license, allowing you to use, change, and implement it as you see appropriate. PostgreSQL has no license fees, thus there’s no risk of over-deployment, saving you on the cost of unused software. PostgreSQL’s passionate community of contributors and enthusiasts finds problems and patches on a regular basis, adding to the database system’s overall security.

To know more about PostgreSQL, visit this link.

Benefits of PostgreSQL

  • Code quality: Every line of code that goes into PostgreSQL is evaluated by numerous specialists, and the whole development process is community-driven, allowing issue reporting, patches, and verification to happen promptly.
  • SQL and NoSQL: PostgreSQL may be used to store JSON documents and as a typical SQL relational database management system for rows of transactional or statistical data. This adaptability can cut costs while also improving your security. You won’t need to recruit or contract with the skills needed to set up, administer, protect, and upgrade different database systems if you use just one database management system.
  • Data Availability & Resiliency: Privately supported versions of PostgreSQL provide extra high availability, resilience, and security capabilities for mission-critical production settings, such as government agencies, financial services corporations, and healthcare organizations.
  • Geographic data: Because Postgres has some outstanding features for managing spatial data, businesses frequently rely on it for applications that utilize it. Postgres, for example, offers particular data types for geometrical objects, and PostGIS makes creating geographical databases simple and quick. Postgres has been particularly popular with transportation and logistics organizations as a result of this.

Introduction to Google BigQuery 

Google BigQuery Logo, BigQuery to PostgreSQL | Hevo Data |
Google BigQuery Logo

Google BigQuery is a serverless, cost-effective, and massively scalable Data Warehousing platform that includes built-in Machine Learning capabilities. Its processes are carried out using the Business Intelligence Engine. It combines fast SQL queries with the processing power of Google’s infrastructure to manage business transactions, data from several databases, and access control limits for people seeing and querying data.

UPS, Twitter, and Dow Jones are some companies that extensively utilize BigQuery. For instance, UPS uses BigQuery to forecast the precise amount of shipments it will receive for its various services. And, Twitter uses BigQuery to assist with ad changes and the aggregation of millions of data points per second.

To know more about Google BigQuery, visit this link.

Methods to Set up BigQuery to PostgreSQL Integration

There are 2 methods for replicating data from BigQuery to PostgreSQL.

Method 1: Using Hevo Data to Set Up BigQuery to PostgreSQL Integration

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources) to destinations like PostgreSQL Databases, we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready. Its fault-tolerant architecture guarantees that data is handled securely and consistently, with no data loss.

Sign up here for a 14-day free trial!

The following are the steps to load data from BigQuery to PostgreSQL using Hevo Data:

  • Step 1: Link your Google Cloud Platform account to Hevo’s platform. Hevo includes a built-in BigQuery integration that allows you to connect to your account in minutes.
Configure BigQuery as source
  • Step 2: Choose PostgreSQL as your destination and begin transferring data.
Configure PostgreSQL as destination

To read more about PostgreSQL as a destination connector in Hevo, read this documentation.

When I saw Hevo, I was amazed by the smoothness with which it worked so many different sources with zero data loss.

– Swati Singhi, Lead Engineer, Curefit

Method 2: Manual ETL Process to Set Up BigQuery to PostgreSQL Integration

Note: Enable your PostgreSQL database to accept connections from Cloud Data Fusion before you begin. We recommend using a private Cloud Data Fusion instance to perform this safely.

Step 1: Open your Cloud Data Fusion instance.

In your Cloud Data Fusion instance, enter your PostgreSQL password as a secure key to encrypt. See Cloud KMS for additional information on keys.

  • Go to the Cloud Data Fusion Instances page in the Google Cloud console.
  • To open your instance in the Cloud Data Fusion UI, click View instance.

Step 2: Save your PostgreSQL password as a protected key.

  • Click System admin > Configuration in the Cloud Data Fusion UI.
  • Make HTTP Calls by clicking the Make HTTP Calls button.
Cloud Data Fusion Configuration, BigQuery to PostgreSQL | Hevo Data |
Make HTTP Calls
  • Select PUT from the dropdown list.
  • Enter namespaces/default/securekeys/pg_password in the path field.
  • Enter "data":"POSTGRESQL_PASSWORD" in the Body field. POSTGRESQL_PASSWORD should be replaced with your PostgreSQL password.
  • Hit Send.
Password, BigQuery to PostgreSQL | Hevo Data |
Select PUT from Dropdown List

Step 3: Connect to Cloud SQL For PostgreSQL.

  • Click the menu in the Cloud Data Fusion UI and go to the Wrangler page.
  • Click the Add connection button.
  • To connect, select Database as the source type.
Add Connection, BigQuery to PostgreSQL | Hevo Data |
Select Database as Source
  • Click Upload under Google Cloud SQL for PostgreSQL.
Choose Source, BigQuery to PostgreSQL | Hevo Data |
Select Upload
  • A JAR file containing your PostgreSQL driver should be uploaded. The format of your JAR file must be NAME-VERSION.jar. Rename your JAR file if it doesn’t meet this format before uploading.
  • Click the Next button.
  • Fill in the fields with the driver’s name, class, and version.
  • Click the Finish button.
  • Click Google Cloud SQL for PostgreSQL in the Add connection box that appears. Under Google Cloud SQL for PostgreSQL, your JAR name should display.
Jar Uploaded, BigQuery to PostgreSQL | Hevo Data |
Google Cloud SQL for PostgreSQL
Choose Password, BigQuery to PostgreSQL | Hevo Data |
Fill in Necessary Fields
  • In the Connection string field, enter your connection string.
  • Replace the following:
    • DATABASE_NAME: the Cloud SQL database name as listed in the Databases tab of the instance details page.
    • INSTANCE_CONNECTION_NAME: the Cloud SQL instance connection name as displayed in the Overview tab of the instance details page.
Instance Connection Name, BigQuery to PostgreSQL | Hevo Data |
Instance Connection Name

Example:

See Manage access for additional information on granting roles.

  • To check that the database connection can be made, click Test connection.
  • Click the Add connection button.

Limitations of Using Cloud Data Fusion

  • JDBC Connection Fail: If you are starting the data fusion for the very first time in your project, you must connect your VPC to the Data Fusion tenant project. Even if you enable Private server access in the VPC, you will receive connection failure errors. 
  • Worker Nodes Count: If you want to execute a single node DataProc cluster, you must set the Worker node to 0, while a multi-node cluster requires at least two worker nodes.
  • Minimum Memory: You need at least 3.5 GB as the minimum memory for both master and worker nodes of the DataProc cluster.

Method 3: Using Apache Airflow to Set Up BigQuery PostgreSQL Integration

Google Cloud BigQuery is Google Cloud’s serverless data warehouse solution. PostgreSQL is an RDBMS available as an open-source project. This operation allows you to copy data from a BigQuery table to PostgreSQL.

Prerequisite Tasks

To use these operators, you must complete the following tasks:

pip install ‘apache-airflow[google]’

Operator

The BigQueryToPostgresOperator operator copies data from a BigQuery table to a PostgreSQL table.

To define variables dynamically, use Jinja templating with the target_table_name, impersonation_chain, dataset_id, and table_id.

You can use the parameter selected_fields to limit the fields that are copied (all fields by default) and the parameter replace to overwrite the destination table rather than append to it. For more information, please see the links above.

Transferring data

The following operator moves data from a BigQuery table to PostgreSQL.

bigquery_to_postgres = BigQueryToPostgresOperator(
    task_id="bigquery_to_postgres",
    dataset_table=f"{DATASET_NAME}.{TABLE}",
    target_table_name=destination_table,
    replace=False,
)

Limitations of Using Apache Airflow

  • Complex configuration: When setting up Apache Airflow for production use, various difficult activities must be completed manually. This includes installing Airflow components, configuring databases, and managing schemas with Airflow db commands.
  • Steep learning curve: Apache Airflow is a powerful tool for orchestrating complicated computational workflows, but it has a high learning curve, particularly for individuals unfamiliar with the concept of programmatic writing, scheduling, and monitoring workloads.
  • Scaling Limitations: Although Apache Airflow is a robust platform for orchestrating complicated activities, it, like any other system, has scaling constraints. 

That’s it about BigQuery connect to PostgreSQL. Now, it’s your turn to decide which method suits your requirement.

Use Cases of BigQuery to PostgreSQL Migration

Some of the applications of BigQuery to PostgreSQL migration are:

  • Manufacturing: Migrating to PostgreSQL can help manufacturers optimize the supply chain, foster innovation, and make manufacturing customer-centric.
  • Data Analytics: It can facilitate data integrity and support all data types to boost analytics performance.
  • Web Applications:  PostgreSQL is a go-to choice for building web applications when used with other open-source technologies like Linux, Apache,  PHP, and Perl (LAMP stack).
  • E-commerce:  Because of the scalability and reliability of PostgreSQL it is well-suited for e-commerce applications that handle large volumes of product data and transaction records.

Conclusion

This article offers an overview of PostgreSQL and BigQuery, as well as a description of their features. Furthermore, it described the two approaches for transferring data from BigQuery to PostgreSQL. Although successful, the manual approach will take a lot of time and resources. Data migration from BigQuery to PostgreSQL is a time-consuming and tedious operation, but with the help of a data integration solution like Hevo, it can be done with little work and in no time.

Visit our Website to Explore Hevo

Businesses can use automated platforms like Hevo Data to set this integration and handle the ETL process. It helps you directly transfer data from a source of your choice to a Data Warehouse, Business Intelligence tools, or any other desired destination in a fully automated and secure manner without having to write any code and will provide you a hassle-free experience.

Moreover, Hevo offers a fully-managed solution to set up data integration from 150+ other data sources(including 30+ free data sources) and will let you directly load data to the destination of your choice. It will automate your data flow in minutes without writing any line of code.

Not sure about purchasing a plan? Sign Up for a 14-day full feature access trial and simplify your Data Ingestion & Integration process. You can also check out our unbeatable pricing and decide the best plan for your needs. 

Let us know what you think in the comments section below, and if you have anything to add, please do so.

Akshaan Sehgal
Former Marketing Content Analyst, Hevo Data

Akshaan is a data science enthusiast who loves to embrace challenges associated with maintaining and exploiting growing data stores. He has a flair for writing in-depth articles on data science where he incorporates his experience in hands-on training and guided participation in effective data management tasks.

No-code Data Pipeline for Replicating BigQuery Data to PostgreSQL

Get Started with Hevo