When migrating data from one BigQuery dataset to another, you can organize and manage your data more effectively for different purposes, including development, testing, and production. By segregating data into different datasets, you can ensure that the changes made during the development or testing phase will not affect the production environment. 

Additionally, migration from one BigQuery project to another provides multiple versions of the same dataset, which helps you recover the data even during disaster recovery.BigQuery to BigQuery migration also helpsto consolidate your data from multiple BigQuery project sources into a central repository and optimize resource usage by distributing workloads to different BigQuery projects.

Let’s look into how you can effortlessly migrate data from BigQuery to BigQuery.

BigQuery: An Overview

BigQuery is a serverless and fully managed data warehouse platform provided by Google Cloud. It integrates various Google Cloud platforms, third-party tools, and built-in machine learning and business intelligence within its unified platform. With BigQuery, you can quickly handle large-scale data analytics and processing tasks to derive relevant insights.

Scalability and cost-effectiveness are two critical features of BigQuery. You can smoothly process terabytes to petabytes of data without requiring extensive infrastructure management, and you only have to pay for the resources you use. BigQuery uses columnar storage format and compression algorithms to store and query massive datasets using SQL-like syntax in Google’s global storage system called Colossus. This helps with faster petabyte-scale query performance and optimizes storage costs.

Several options are available to access BigQuery, including the Google Cloud Platform (GCP) console, BigQuery REST API calls, the bq command line tool, or client libraries like Java or Python.

Solve your data replication problems with Hevo’s reliable, no-code, automated pipelines with 150+ connectors.
Get your free trial right away!

Methods for BigQuery to BigQuery Data Transfer

Just like data migration from BigQuery to any other data warehouse, you can migrate data from BigQuery to BigQuery within the same project or across different projects. Let’s look at two easy methods to effortlessly transfer data from one BigQuery project to another BigQuery project.

Method 1: Using Hevo Data to Transfer Data from BigQuery to BigQuery

Hevo Data is the only real-time ELT, no-code platform that integrates with over 150+ sources that is a low-cost and flexible solution to automate data pipelines according to your requirements. Hevo Data helps you export data from sources, load it into the destinations, and transform it for in-depth analysis.

Here are some important features of Hevo:

  • Data Transformation: Before loading data into the destination, you can clean, pre-process, and standardize the data using Python scripts or drag-and-drop transformation blocks.
  • Auto Schema Mapping: Hevo’s Auto Mapping feature helps you to automatically define how the data accessed from your source database should be stored in the destination.
  • Audit Tables: You can create Audit Tables in your destination database to track and monitor all the pipeline activities, including data ingestion, loading, and transformations for data replication.
  • Data Ingestion: Hevo ingests the data in one of the following ways” because we support different strategies for different connectors. It can be Pull-based or Push-based ingestion.

To migrate your data from BigQuery to BigQuery, you can set up a data pipeline using Hevo Data. This involves setting up BigQuery as the source and BigQuery as the destination using the readily available connectors.

Let’s explore the step-by-step process for setting up the pipeline.

Step 1: Setting up Google BigQuery as Your Pipeline Source

Before you begin, ensure the following prerequisites are in place:

  • Access to a BigQuery project with one or more datasets, each with at least one table. 
  • An active billing account linked to your GCP project.
  • Permissions, including BigQuery Data Editor, Job User, and Data Viewer at the dataset level.
  • Access to an existing Google Cloud Storage (GCS) bucket in the BigQuery project location containing your datasets.
  • Team Administrator, Team Collaborator, and Pipeline Administrator role in Hevo.

Let’s look into the steps to set up BigQuery as the source end of your pipeline:

  1. Go to the Navigation bar and click PIPELINES.
  2. In the Pipelines List View, click + CREATE.
  3. Access the Select Source Type page, and choose Select Google BigQuery.
  4. Go to the Configure your BigQuery Account page and connect to your BigQuery data warehouse using one of the following approaches:
    • Connect with a user account by following the steps below:
      • Choose an already configured account, and then click on CONTINUE.
      • Create a BigQuery Account by clicking + ADD BIGQUERY ACCOUNT.
BigQuery to BigQuery: Configuring your BigQuery account 

Configuring your BigQuery account  

  • Configuring the BigQuery account using the given steps:
    • Sign in to your BigQuery account
    • Enable Hevo to access your data by clicking on the Allow option.
BigQuery to BigQuery: Enabling Hevo to access data

Enabling Hevo to access data

  • Connect with a service account by following the steps below:
    • Choose an already configured account, and then click CONTINUE.
    • Upload the Service Account Key JSON file generated in GCP and click the CONFIGURE BIGQUERY ACCOUNT.
BigQuery to BigQuery: Uploading Service Account Key

Uploading Service Account Key

  1. When you’re redirected to the Configure Your BigQuery Source page, fill in the following fields:
BigQuery to BigQuery: Configuring your BigQuery Source

Configuring your BigQuery Source

  • Pipeline Name: A unique pipeline name for your source that is within 0-255 characters.
  • Authorized User/Service Account: It is a non-editable field where the email address selected for connecting to your BigQuery account is pre-filled.
  • Project ID: An ID in which you want to create your pipeline. Choose a required project ID from the drop-down menu.
  • GCS Bucket: This is an optional field that holds the name of the existing GCS container. In this container, Hevo exports the fetched data before streaming it to the Destination.
  • Select Dataset ID: Choose datasets that have data tables. 
  • Advanced Settings:
    • Enable Include New Tables in the Pipeline to automatically fetch data from the tables generated after the Pipeline is created.
    • Disable Include New Tables in the Pipeline to list the new tables in Pipelines Detailed View in Skipped State. You can manually add the required tables to the list and load their historical data.
  1. Click on the TEST & CONTINUE option.

To learn more about the BigQuery configuration as a source, refer to the Hevo Documentation for Google BigQuery Source Connector page.

Step 2: Setting up Google BigQuery as Your Pipeline Destination

Before you get started, ensure the following list of prerequisites are in place:

  • Access the Google Cloud Platform (GCP) or set it up if you do not have one.
  • The connecting BigQuery account must have essential roles for the GCP project, regardless of whether it holds an Owner or Admin role.
  • An active billing account linked to your GCP project.
  • You must have a Team Collaborator or any other administrator role except the Billing Administrator role in Hevo to create the destination.

Let’s look into the steps to set up BigQuery as the destination end of your pipeline:

  1. In the Navigation Bar, click DESTINATIONS.
  2. Go to Destinations List View, and click + CREATE.
  3. Choose Google BigQuery as the destination type from the Add Destination page.
  4. Fill in the following fields in the Configure your Google BigQuery Warehouse page.
BigQuery to BigQuery: Configuring BigQuery Warehouse

Configuring BigQuery Warehouse

  • Destination Name: A unique pipeline name for your destination.
  • Account: To authenticate and connect to BigQuery, choose a service account or a user account.
    • To connect with a user account, do the following:
      • Click on + ADD GOOGLE BIGQUERY ACCOUNT.
      • Sign in to your BigQuery account and select the Allow option for Hevo to access your data. 
BigQuery to BigQuery: Allowing Hevo to Access the Data

Allowing Hevo to Access the Data

  • To connect with a service account, upload the Service Account Key JSON file created in GCP.
  • Project ID: Choose the required project ID for your BigQuery instance.
  • Dataset: Pick from one of the following ways to create a dataset for your BigQuery Project.
    • Enable the Automatically create a dataset option and check whether you have the necessary permissions to create a dataset.
BigQuery to BigQuery: Automatically Create a Dataset

Automatically Create a Dataset

  • If you have required permissions, Hevo automatically creates a dataset for your selected project ID with the name 
hevo_dataset_<Project_ID>

Example: If the Project ID is westeros-153019, a dataset named hevo_dataset_westeros_153019 is created.

  • If you do not have the required permissions, Hevo will provide a warning related to insufficient permissions, and you must be assigned the required permissions from the GCP console.

To get the required permission, read Updating roles for an existing user account or Updating roles for a Google service account. 

  • After getting the necessary permissions, click the Check Again option available in the Hevo User Interface.
  • Disable Automatically create a dataset and manually perform one of the following:
BigQuery to BigQuery: Manually Selecting a BigQuery Project Dataset

Manually Selecting a BigQuery Project Dataset

  • Choose your required dataset from the Dataset drop-down.
  • If you cannot find the required dataset from the drop-down, then click on + CREATE “<Dataset_Name>” as a new Dataset where <Dataset_Name> is your required dataset name.
  • GCS Bucket: Perform one of the following ways to create a GCS bucket for your BigQuery Project.
    • Enable Automatically create a GCS bucket option and check whether you have the necessary permissions to create a GCS bucket.
BigQuery to BigQuery: Automatically Create a GCS Bucket

Automatically Create a GCS Bucket

  • If you have permission, Hevo automatically creates a GCS bucket for your selected project ID with the name 
hevo_bucket_<Project_ID>

Example: If the Project ID is westeros-153019, a GCS bucket with the name hevo_bucket_westeros_153019 is created.

  • After granting the necessary permissions, click the Check Again option on the Hevo User Interface.
  • Disable Automatically create a GCS bucket option and manually do one of the following:
BigQuery to BigQuery: Manually Selecting a GCS Bucket for BigQuery Project

Manually Selecting a GCS Bucket for BigQuery Project

  • Choose your desired bucket from the GCS Bucket drop-down.
  • If you cannot find the required bucket from the drop-down, then click on + CREATE “<Bucket_Name>” as a new Bucket where <Bucket_Name> is your required bucket name. 
  • Advanced Settings: 
    • Enable Populate Loaded Timestamp to append ___hevo_loaded_at_ column to the destination table. This will denote the time at which the event was loaded to the destination table.
  • Enable Sanitize Table/Column Names to delete all the spaces and non-alphanumeric characters between the column and table names. This will replace them with an underscore (_).

To learn more about how to sanitize Table/Column names, read Name Sanitization

  1. Click on the TEST CONNECTION button.
  2. Click on the SAVE & CONTINUE button.

To learn more about BigQuery configuration as a destination, refer to the Hevo Documentation for Google BigQuery Destination Connector page.

If you’re searching for the best practices for creating source and destination pipelines, read Best Practices for Creating Database Pipelines.

Method 2: Using CSV Export/Import to Transfer Data from BigQuery to BigQuery

This section involves exporting the data from a BigQuery project dataset to a CSV file and importing the CSV file into another BigQuery project dataset. 

Let’s explore the steps in detail.

Step 1: Export Data from BigQuery

There are mainly three ways to export data from a BigQuery dataset to a CSV file:

  1. Using Google Cloud Console
  2. Using the bq extract command in the command line tool bq.
  3. By submitting an extract job through client libraries or APIs. 

Google Cloud Console is the easiest method to export data from a BigQuery project to a CSV file because it does not need extensive technical knowledge. Follow the steps below to export your data using Google Cloud Console.

  1. Go to Google Cloud Console and open the BigQuery page.
  2. Select the table you want to export from your BigQuery project from the Explorer panel.
  3. In the Details panel, click the Export button and choose the Export to Cloud Storage option.
  4. An Export Table to Cloud Storage dialog box will appear on your screen. Fill in the following fields:
BigQuery to BigQuery: Export BigQuery Table to a CSV file

Export BigQuery Table to a CSV file

  • Select GCS Location: Browse for a destination bucket, folder, or file to export your data.
  • Export Format: Select CSV from the drop-down menu.
  • Compression: Select GZIP compression for CSV file or None for no compression.
  1. Click on the Export button to move the selected BigQuery table to a CSV file.

Step 2: Import CSV Data into Another BigQuery Project

Just like Google Cloud Console is used for exporting data, you can also use it to import the CSV data into another BigQuery project dataset. Follow the steps below to import data from CSV to BigQuery using Google Cloud Console.

  1. Open the BigQuery page from the Google Cloud Console.
  2. From the Explorer panel, select the BigQuery project into which you want to import CSV data.
  3. Navigate to the Data set info, and click the Create Table option.
BigQuery to BigQuery: Create a Table in a BigQuery Project

Create a Table in a BigQuery Project

  1. A dialog box called Create Table will appear on your screen. You must specify the required fields.
BigQuery to BigQuery: Configuring Create Table Panel

Configuring Create Table Panel

  • Under the Source section, fill in the following fields:
    • Create a table from: You can choose Google Cloud Storage, Upload, or Drive from the drop-down menu.
    • File format: Choose CSV file format from the drop-down menu.
  • Under the Destination section, fill in the following fields:
    • Project: Name of the BigQuery project in destination.
    • Dataset: Dataset Name of the BigQuery project in destination.
    • Table: Name for the new table in the BigQuery project in destination.
    • Table type: Specify the type of the table.
  • Under the Schema section, choose one of the following options:
    • Enable Auto-Detect to generate the schema information automatically.
    • Enable Edit as text to enter the schema information manually.
  1. Click the Create Table button to create a new table in the BigQuery project by fetching the attached CSV data.

Limitations of CSV Export/Import for BigQuery to BigQuery Data Migration

  • Slow Performance: CSV export/import is a time-consuming and resource-intensive process for medium-to-large datasets. Exporting data to CSV files and then importing them back into BigQuery can lead to a longer time for data migration.
  • Data Integrity: During CSV export/import, data types or schema may not be preserved accurately. The data types or schema in the CSV file may mismatch with the destination database after the migration.
  • Schema Evolution: Since the schemas of CSV and BigQuery are different, data transformations are required before or after exporting/importing data, leading to effort-intensive processes.
  • No Incremental Updates: The CSV export/import does not include in-built incremental updates. If data changes frequently in the source database, the entire dataset must be transferred to the destination database. This may lead to data loss or inefficiencies

Use Cases of BigQuery to BigQuery Data Migration

  • Quick Export/Import: When migrating data from BigQuery to BigQuery, transformation is not required as both the source and destination schema are the same. This can save you time and effort when exporting and importing the data.
  • Create backups: Replicating data from BigQuery to BigQuery helps create backups and maintain a separate platform for development, testing, and production.
  • Data Consolidation: When migrating data from one or more BigQuery projects to a single BigQuery project, you can streamline data management and simplify the in-depth analytics tasks.

Conclusion

Data migration from BigQuery to BigQuery is possible, just like data migration from BigQuery to another data warehouse. Compared to other data migrations, this would be fairly simple and require less time, especially since the schema for the source and destination are the same.

Methods for BigQuery to BigQuery data migration are not limited to one or two. You can choose either one of the methods in this article for data migration. The manual CSV Export/Import method is a time-consuming and resource-intensive process, leading to slow performance. To ease the time and effort required for data migration, you should choose Hevo Data. 

To transfer data from a BigQuery table to another, you can also utilize the BigQuery to BigQuery operator provided by the Google Cloud Platform. Read about Google Cloud BigQuery to BigQuery Operator.

Frequently Asked Questions (FAQs)

1. How do you migrate the data between different BigQuery tables within a Google Cloud Platform?

To migrate data across BigQuery tables within a Google Cloud Platform, you can use Apache Airflow’s Google Cloud BigQueryToBigQueryOperator command. With this command, you can define values for the tables dynamically by using Jinja templating with source_project_dataset_tables, destination_project_dataset_table, impersonation_chain, and labels. In addition, you can combine multiple source tables to migrate into destination tables by defining write_disposition and create_disposition.

2. When performing BigQuery to BigQuery migrations using Hevo, I have encountered the limitation of replicating a maximum of 4096 columns to every BigQuery table. How can I address this limitation using Hevo itself?

To ensure a smooth replication from BigQuery to BigQuery, you can utilize the data transformations, such as handling missing columns, filtering, and aggregations, to remove the columns you no longer require in your BigQuery table.

Sony Saji
Technical Writer, Hevo Data

Sony is a former Computer Science teacher turned technical content writer, specializing in crafting blogs and articles on topics such as ML, AI, Python Frameworks, and other emerging trends in Data Science and Analytics.

All your customer data in one place.