Are you trying to derive deeper insights from PostgreSQL by moving the data into a Data Warehouse like Google BigQuery? Well, you have landed on the right article. Now, it has become easier to replicate data from PostgreSQL to BigQuery.

This article will give you a brief overview of PostgreSQL and Google BigQuery. You will also get to know how you can set up your PostgreSQL to BigQuery integration using 2 methods.

Moreover, the limitations in the case of the manual method will also be discussed in further sections. Read along to decide which method of connecting PostgreSQL to BigQuery is best for you.

Introduction to PostgreSQL

PostgreSQL, although primarily used as an OLTP Database, is one of the popular tools for analyzing data at scale. Its novel architecture, reliability at scale, robust feature set, and extensibility give it an advantage over other databases.

Introduction to Google BigQuery

Google BigQuery is a serverless, cost-effective, and highly scalable Data Warehousing platform with Machine Learning capabilities built-in.

The Business Intelligence Engine is used to carry out its operations. It integrates speedy SQL queries with Google’s infrastructure’s processing capacity to manage business transactions, data from several databases, and access control restrictions for users seeing and querying data.

BigQuery is used by several firms, including UPS, Twitter, and Dow Jones. BigQuery is used by UPS to predict the exact volume of packages for its various services.

BigQuery is used by Twitter to help with ad updates and the combining of millions of data points per second.

The following are the features offered by BigQuery for data privacy and protection of your data. These include:

  • Encryption at rest 
  • Integration with Cloud Identity
  • Network isolation
  • Access Management for granular access control

Methods to Set up PostgreSQL to BigQuery Integration

For the scope of this blog, the main focus will be on Method 1 and detail the steps and challenges. Towards the end, you will also get to know about both methods, so that you have the right details to make a choice. Below are the 2 methods:

Method 1: Using Hevo Data to Set Up PostgreSQL to BigQuery Integration

The steps to load data from PostgreSQL to BigQuery using Hevo Data are as follows:

  • Step 1: Connect your PostgreSQL account to Hevo’s platform. Hevo has an in-built PostgreSQL Integration that connects to your account within minutes.
Connect your PostgreSQL account to Hevo
Move Data from PostgreSQL to BigQuery
Move Data from Salesforce to BigQuery
Move Data from Google Ads to BigQuery
Move Data from MongoDB to BigQuery

The available ingestion modes are Logical Replication, Table, and Custom SQL. Additionally, the XMIN ingestion mode is available for Early Access. Logical Replication is the recommended ingestion mode and is selected by default.

  • Step 2: Select Google BigQuery as your destination and start moving your data.

With this, you have successfully set up Postgres to BigQuery replication using Hevo Data.

Here are more reasons to try Hevo:

  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.

Method 2: Manual ETL Process to Set Up PostgreSQL to BigQuery Integration

To execute the following steps, you need a pre-existing database and a table populated with PostgreSQL records. 

Let’s take a detailed look at each step.

Step 1: Extract Data From PostgreSQL

The data from PostgreSQL needs to be extracted and exported into a CSV file. To do that, write the following command in the PostgreSQL workbench.

COPY your_table_name TO ‘new_file_location\new_file_name’ CSV HEADER
PostgreSQL workbench

After the data is successfully migrated to a CSV file, you should see the above message on your console.

Step 2: Clean and Transform Data

To upload the data to Google BigQuery, you need the tables and the data to be compatible with the bigQuery format. The following things need to be kept in mind while migrating data to bigQuery:

  • BigQuery expects CSV data to be UTF-8 encoded. 
  • BigQuery doesn’t enforce Primary Key and unique key constraints. Your ETL process must do so.
  • Postgres and BigQuery have different column types. However, most of them are convertible. The following table lists common data types and their equivalent conversion type in BigQuery.
Transformation data

You can visit their official page to know more about BigQuery data types.

  • DATE value must be a dash(-) separated and in the form YYYY-MM-DD (year-month-day).

Fortunately, the default date format in Postgres is the same, YYYY-MM-DD.So if you are simply selecting date columns it should be the incorrect format.

The TO_DATE function in PostgreSQL helps in converting string values into dates.

If the data is stored as a string in the table for any reason, it can be converted while selecting data.

Syntax : TO_DATE(str,format)
Example : SELECT  TO_DATE('31,12,1999','%d,%m,%Y');
Result : 1999-12-31
  • In TIMESTAMP type, the hh:mm:ss (hour-minute-second) portion must use a colon (:) separator.

Similar to the Date type, the TO_TIMESTAMP function in PostgreSQL is used to convert strings into timestamps.

Syntax : TO_TIMESTAMP(str,format)
Example : SELECT TO_TIMESTAMP('2017-03-31 9:30:20','YYYY-MM-DD HH:MI:SS');
Result: 2017-03-31 09:30:20-07
  • Make sure text columns are quoted if they can potentially have delimiter characters.

Step 3: Upload to Google Cloud Storage(GCS) bucket

If you haven’t already, you need to create a storage bucket in Google Cloud for the next step

3. a) Go to your Google Cloud account and Select the  Cloud Storage → Bucket.

Google Cloud Platform

3. b) Select a bucket from your existing list of buckets. If you do not have a previously existing bucket, you must create a new one. You can follow  Google’s Official documentation to create a new bucket. 

 GCP buckets

3. c) Upload your .csv file into the bucket by clicking the upload file option. Select the file that you want to upload.

GCP file upload

Step 4: Upload to BigQuery table from GCS

4. a) Go to the Google Cloud console and select BigQuery from the dropdown. Once you do so, a list of project IDs will appear. Select the Project ID you want to work with and select Create Dataset

BigQuery console

4. b) Provide the configuration per your requirements and create the dataset.

 Creating Dataset in BigQuery

Your dataset should be successfully created after this process.

4. c) Next, you must create a table in this dataset. To do so, select the project ID where you had created the dataset and then select the dataset name that was just created. Then click on Create Table from the menu, which appears at the side.

Creating Table in BigQuery

4. d) To create a table, select the source as Google Cloud Storage. Next, select the correct GCS bucket with the .csv file. Then, select the file format that matches the GCS bucket. In your case, it should be in .csv file format. You must provide a table name for your table in the bigQuery database. Select the mapping option as automapping if you want to migrate the data as it is.

 Configuration for creating table

4. e) Your table should be created next and loaded with the same data from PostgreSQL.

Table Preview

Step 5: Query the table in BigQuery

After loading the table into bigQuery, you can query it by selecting the QUERY option above the table. You can query your table by writing basic SQL syntax.

Query table option

Note: Mention the correct project ID, dataset name, and table name.

Query

The above query extracts records from the emp table where the job is manager.

Advantages of manually loading the data from PostgreSQL to BigQuery:

  1. Manual migration doesn’t require setting up and maintaining additional infrastructure, which can save on operational costs.
  2. Manual migration processes are straightforward and involve fewer components, reducing the complexity of the operation.
  3. You have complete control over each step of the migration process, allowing for customized data handling and immediate troubleshooting if issues arise.
  4. By manually managing data transfer, you can ensure compliance with specific security and privacy requirements that might be critical for your organization.

Does PostgreSQL Work As a Data Warehouse? 

Yes, you can use PostgreSQL as a data warehouse. But, the main challenges are,

  • A data engineer will have to build a data warehouse architecture on top of the existing design of PostgreSQL. To store and build models, you will need to create multiple interlinked databases. But, as PostgreSQL lacks the capability for advanced analytics and reporting, this will further limit the use of it.
  • PostgreSQL can’t handle the data processing of huge data volume. Data warehouses have the features such as parallel processing for advanced queries which PostgreSQL lacks. This level of scalability and performance with minimal latency is not possible with the database.

Limitations of the Manual Method:

  1.  The manual migration process can be time-consuming, requiring significant effort to export, transform, and load data, especially if the dataset is large or complex.
  2. Manual processes are susceptible to human errors, such as incorrect data export settings, file handling mistakes, or misconfigurations during import.
  3. If the migration needs to be performed regularly or involves multiple tables and datasets, the repetitive nature of manual processes can lead to inefficiency and increased workload.
  4. Manual migrations can be resource-intensive, consuming significant computational and human resources, which could be utilized for other critical tasks.
Additional Read –

Conclusion

  • Migrating data from PostgreSQL to BigQuery manually can be complex, but automated data pipeline tools can significantly simplify the process.
  • We’ve discussed two methods for moving data from PostgreSQL to BigQuery: the manual process, which requires a lot of configuration and effort, and automated tools like Hevo Data.
  • Whether you choose a manual approach or leverage data pipeline tools like Hevo Data, following the steps outlined in this guide will help ensure a successful migration.

FAQ on PostgreSQL to BigQuery

How do you transfer data from Postgres to BigQuery?

To transfer data from PostgreSQL to BigQuery, export your PostgreSQL data to a format like CSV or JSON, then use BigQuery’s data import tools or APIs to load the data into BigQuery tables.

Can I use PostgreSQL in BigQuery?

No, BigQuery does not natively support PostgreSQL as a database engine. It is a separate service with its own architecture and SQL dialect optimized for large-scale analytics and data warehousing.

Can PostgreSQL be used for Big Data?

Yes, PostgreSQL can handle large datasets and complex queries effectively, making it suitable for big data applications.

How do you migrate data from Postgres to Oracle?

To migrate data from PostgreSQL to Oracle, use Oracle’s Data Pump utility or SQL Developer to export PostgreSQL data as SQL scripts or CSV files, then import them into Oracle using SQL Loader or SQL Developer.

Skand Agrawal
Customer Experience Engineer, Hevo Data

Skand is a dedicated Customer Experience Engineer at Hevo Data, specializing in MySQL, Postgres, and REST APIs. With three years of experience, he efficiently troubleshoots customer issues, contributes to the knowledge base and SOPs, and assists customers in achieving their use cases through Hevo's platform.

All your customer data in one place.