Are you looking to execute operational Database activities on your BigQuery data by transferring it to a PostgreSQL database? Well, you’ve come to the right place. Data replication from BigQuery to PostgreSQL is now much easier.

This article will provide a quick introduction to PostgreSQL and Google BigQuery. You’ll also learn how to set up your BigQuery to PostgreSQL integration using two different techniques. To figure out which way of connecting BigQuery to PostgreSQL is ideal for you, keep reading.

Introduction to PostgreSQL

PostgreSQL Logo, BigQuery to PostgreSQL | Hevo Data |

PostgreSQL is a high-performance, enterprise-level open-source relational database that enables both SQL (relational) and JSON (non-relational) querying. It’s a very reliable database management system, with more than two decades of community work to thank for its high levels of resiliency, integrity, and accuracy. Many online, mobile, geospatial, and analytics applications utilize PostgreSQL as their primary data storage or data warehouse.

The source code for PostgreSQL is freely accessible under an open source license, allowing you to use, change, and implement it as you see appropriate. PostgreSQL has no license fees, thus there’s no risk of over-deployment, saving you on the cost of unused software. PostgreSQL’s passionate community of contributors and enthusiasts finds problems and patches on a regular basis, adding to the database system’s overall security.

Benefits of PostgreSQL

  • Code quality: Every line of code that goes into PostgreSQL is evaluated by numerous specialists, and the whole development process is community-driven, allowing issue reporting, patches, and verification to happen promptly.
  • SQL and NoSQL: PostgreSQL may be used to store JSON documents and as a typical SQL relational database management system for rows of transactional or statistical data. This adaptability can cut costs while also improving your security. You won’t need to recruit or contract with the skills needed to set up, administer, protect, and upgrade different database systems if you use just one database management system.
  • Data Availability & Resiliency: Privately supported versions of PostgreSQL provide extra high availability, resilience, and security capabilities for mission-critical production settings, such as government agencies, financial services corporations, and healthcare organizations.
  • Geographic data: Because Postgres has some outstanding features for managing spatial data, businesses frequently rely on it for applications that utilize it. Postgres, for example, offers particular data types for geometrical objects, and PostGIS makes creating geographical databases simple and quick. Postgres has been particularly popular with transportation and logistics organizations as a result of this.

Introduction to Google BigQuery 

Google BigQuery Logo, BigQuery to PostgreSQL | Hevo Data |

Google BigQuery is a serverless, cost-effective, and massively scalable Data Warehousing platform that includes built-in Machine Learning capabilities. Its processes are carried out using the Business Intelligence Engine. It combines fast SQL queries with the processing power of Google’s infrastructure to manage business transactions, data from several databases, and access control limits for people seeing and querying data.

UPS, Twitter, and Dow Jones are some companies that extensively utilize BigQuery. For instance, UPS uses BigQuery to forecast the precise amount of shipments it will receive for its various services. And, Twitter uses BigQuery to assist with ad changes and the aggregation of millions of data points per second.

Load Data from BigQuery to PostgreSQL
Load Data from PostgreSQL to BigQuery

Methods to Set up BigQuery to PostgreSQL Integration

There are 3 methods for replicating data from BigQuery to PostgreSQL.

Method 1: Manual ETL Process to Set Up BigQuery to PostgreSQL Integration

Note: Enable your PostgreSQL database to accept connections from Cloud Data Fusion before you begin. We recommend using a private Cloud Data Fusion instance to perform this safely.

Step 1: Open your Cloud Data Fusion instance.

In your Cloud Data Fusion instance, enter your PostgreSQL password as a secure key to encrypt. See Cloud KMS for additional information on keys.

  • Go to the Cloud Data Fusion Instances page in the Google Cloud console.
  • To open your instance in the Cloud Data Fusion UI, click View instance.

Step 2: Save your PostgreSQL password as a protected key.

  • Click System admin > Configuration in the Cloud Data Fusion UI.
  • Make HTTP Calls by clicking the Make HTTP Calls button.
Cloud Data Fusion Configuration, BigQuery to PostgreSQL | Hevo Data |
Make HTTP Calls
  • Select PUT from the dropdown list.
  • Enter namespaces/default/securekeys/pg_password in the path field.
  • Enter "data":"POSTGRESQL_PASSWORD" in the Body field. POSTGRESQL_PASSWORD should be replaced with your PostgreSQL password.
  • Hit Send.
Password, BigQuery to PostgreSQL | Hevo Data |
Select PUT from Dropdown List

Step 3: Connect to Cloud SQL For PostgreSQL.

  • Click the menu in the Cloud Data Fusion UI and go to the Wrangler page.
  • Click the Add connection button.
  • To connect, select Database as the source type.
Add Connection, BigQuery to PostgreSQL | Hevo Data |
Select Database as Source
  • Click Upload under Google Cloud SQL for PostgreSQL.
Choose Source, BigQuery to PostgreSQL | Hevo Data |
Select Upload
  • A JAR file containing your PostgreSQL driver should be uploaded. The format of your JAR file must be NAME-VERSION.jar. Rename your JAR file if it doesn’t meet this format before uploading.
  • Click the Next button.
  • Fill in the fields with the driver’s name, class, and version.
  • Click the Finish button.
  • Click Google Cloud SQL for PostgreSQL in the Add connection box that appears. Under Google Cloud SQL for PostgreSQL, your JAR name should display.
Jar Uploaded, BigQuery to PostgreSQL | Hevo Data |
Google Cloud SQL for PostgreSQL
Choose Password, BigQuery to PostgreSQL | Hevo Data |
Fill in Necessary Fields
  • In the Connection string field, enter your connection string.
  • Replace the following:
    • DATABASE_NAME: the Cloud SQL database name as listed in the Databases tab of the instance details page.
    • INSTANCE_CONNECTION_NAME: the Cloud SQL instance connection name as displayed in the Overview tab of the instance details page.
Instance Connection Name, BigQuery to PostgreSQL | Hevo Data |
Instance Connection Name

Example:

See Manage access for additional information on granting roles.

  • To check that the database connection can be made, click Test connection.
  • Click the Add connection button.

Limitations of Using Cloud Data Fusion

  • JDBC Connection Fail: If you are starting the data fusion for the very first time in your project, you must connect your VPC to the Data Fusion tenant project. Even if you enable Private server access in the VPC, you will receive connection failure errors. 
  • Worker Nodes Count: If you want to execute a single node DataProc cluster, you must set the Worker node to 0, while a multi-node cluster requires at least two worker nodes.
  • Minimum Memory: You need at least 3.5 GB as the minimum memory for both master and worker nodes of the DataProc cluster.

Method 2: Using Apache Airflow to Set Up BigQuery PostgreSQL Integration

Google Cloud BigQuery is Google Cloud’s serverless data warehouse solution. PostgreSQL is an RDBMS available as an open-source project. This operation allows you to copy data from a BigQuery table to PostgreSQL.

Prerequisite Tasks

To use these operators, you must complete the following tasks:

pip install ‘apache-airflow[google]’

Operator

The BigQueryToPostgresOperator operator copies data from a BigQuery table to a PostgreSQL table.

To define variables dynamically, use Jinja templating with the target_table_name, impersonation_chain, dataset_id, and table_id.

You can use the parameter selected_fields to limit the fields that are copied (all fields by default) and the parameter replace to overwrite the destination table rather than append to it. For more information, please see the links above.

Transferring data

The following operator moves data from a BigQuery table to PostgreSQL.

bigquery_to_postgres = BigQueryToPostgresOperator(
    task_id="bigquery_to_postgres",
    dataset_table=f"{DATASET_NAME}.{TABLE}",
    target_table_name=destination_table,
    replace=False,
)

Limitations of Using Apache Airflow

  • Complex configuration: When setting up Apache Airflow for production use, various difficult activities must be completed manually. This includes installing Airflow components, configuring databases, and managing schemas with Airflow db commands.
  • Steep learning curve: Apache Airflow is a powerful tool for orchestrating complicated computational workflows, but it has a high learning curve, particularly for individuals unfamiliar with the concept of programmatic writing, scheduling, and monitoring workloads.
  • Scaling Limitations: Although Apache Airflow is a robust platform for orchestrating complicated activities, it, like any other system, has scaling constraints. 

That’s it about BigQuery connect to PostgreSQL. Now, it’s your turn to decide which method suits your requirement.

Method 3: Using Hevo Data to Set Up BigQuery to PostgreSQL Integration

The following are the steps to load data from BigQuery to PostgreSQL using Hevo Data:

  • Step 1: Link your Google Cloud Platform account to Hevo’s platform. Hevo includes a built-in BigQuery integration that allows you to connect to your account in minutes.
Configure BigQuery as source
  • Step 2: Choose PostgreSQL as your destination and begin transferring data.
Configure PostgreSQL as destination

To read more about PostgreSQL as a destination connector in Hevo, read through Hevo documentation.

When I saw Hevo, I was amazed by the smoothness with which it worked so many different sources with zero data loss.

– Swati Singhi, Lead Engineer, Curefit

Use Cases of BigQuery to PostgreSQL Migration

Some of the applications of BigQuery to PostgreSQL migration are:

  • Manufacturing: Migrating to PostgreSQL can help manufacturers optimize the supply chain, foster innovation, and make manufacturing customer-centric.
  • Data Analytics: It can facilitate data integrity and support all data types to boost analytics performance.
  • Web Applications:  PostgreSQL is a go-to choice for building web applications when used with other open-source technologies like Linux, Apache,  PHP, and Perl (LAMP stack).
  • E-commerce:  Because of the scalability and reliability of PostgreSQL it is well-suited for e-commerce applications that handle large volumes of product data and transaction records.

Read more about Migrating from MySQL to PostgreSQL.

Conclusion

This article offers an overview of PostgreSQL and BigQuery, as well as a description of their features. Furthermore, it described the two approaches for transferring data from BigQuery to PostgreSQL.

Although successful, the manual approach will take a lot of time and resources. Data migration from BigQuery to PostgreSQL is a time-consuming and tedious operation, but with the help of a data integration solution like Hevo, it can be done with little work and in no time.

FAQ on BigQuery to PostgreSQL

How do you copy data from BigQuery to Postgres?

To copy data from BigQuery to Postgres, you need to first export the data to a file format such as CSV, then use Postgres’s COPY command or tools like psql to import the data, and you can optionally also transform the data to ensure compatibility.

Does BigQuery support PostgreSQL?

BigQuery itself does not support PostgreSQL as a direct integration.

Is BigQuery faster than Postgres?

Due to its distributed computing architecture and parallel processing capabilities, BigQuery is typically faster than PostgreSQL for large-scale analytical queries over massive datasets.

How to connect to PostgreSQL in GCP?

In order to connect PostgreSQL in GCP, you can access instance details by navigating to Google Cloud Console, then configure the Firewall Rules, and finally connect using Client Tools like psql or pgAdmin.

Akshaan Sehgal
Marketing Content Analyst, Hevo Data

Akshaan is a dedicated data science enthusiast who is passionate about navigating and leveraging extensive data repositories. His expertise lies in crafting insightful articles on data science, enriched by hands-on training and active involvement in proficient data management tasks. Akshaan excels in A/B testing and optimizing content for enhanced product activations. With a background in Computer Science and a Master's in Management Analytics, he combines theoretical knowledge with practical skills to drive impactful business insights.

No-code Data Pipeline for Replicating BigQuery Data to PostgreSQL