Understanding BigQuery Copy Tables Simplified 101

• January 28th, 2022

BigQuery Copy Table

Nowadays, a large population is ready to share their personal data on digital platforms in exchange for offers or discounts. As a growing business, one should take advantage of it and use this data to convert leads into sales. Tools like Google’s BigQuery are great for enterprise-level marketing circles. The Data Warehouse solution helps in managing such consumer data and performing analysis for better decision-making.

In this article, you will learn about Google BigQuery and BigQuery Copy Table. Also, you will learn about different ways to perform Google BigQuery Copy Table using the Bq command, APIs, and BigQuery Console.

Table of Contents

Introduction to Google BigQuery

Google BigQuery Logo
Image Source

Google BigQuery is a Cloud Data Warehouse that has a serverless architecture. It was introduced to manage and perform analysis on large data sets. It has built-in integrations that make it one of the cost-effective and fast Data Warehouse solutions. It is highly scalable and can run SQL queries of large data sets in seconds. 

Google BigQuery is a part of Google Cloud Platform (GCP) which means it can leverage the power of Google Cloud Functions to deliver the best results. It is built on Dremel’s technology and users can independently scale up or scale down both storage and computation power according to their needs.

Key Features of Google BigQuery

Some of the main features of Google BigQuery are listed below:

  • Fault-tolerant Structure: Google BigQuery comes with fault tolerance that protects data from any data loss during any disaster.
  • Automatic Backup: Google BigQuery automatically creates backup and recovery options to secure data.
  • Google BigQuery ML: It offers pre-built Machine Learning environments that allow users to easily test and deploy models using data in Google BigQuery.

To know more about Google BigQuery, click here.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ data sources and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Get Started with Hevo for Free

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensures that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  1. Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  2. Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  3. Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  4. Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  5. Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  6. Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, E-Mail, and support calls.
  7. Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Required Permissions for Copying BigQuery Data

Permission is needed for Google BigQuery Copy Tables and partitions

For this section, you require IAM permissions for both the source and destination datasets to perform Google BigQuery Copy Tables. 

  • For the source dataset, you require bigquery.tables.get and bigquery.tables.getData
  • For the destination dataset, you require bigquery.tables.create

Permission Needed to Run Google BigQuery Copy Tables Jobs

bigquery.jobs.create IAM permission is essential to run Google BigQuery Copy Tables jobs. Each IAM role includes permissions:

  • roles/bigquery.user
  • roles/bigquery.jobUser
  • roles/bigquery.admin

Remember, if you already hold the IAM permission, you can also perform Google BigQuery Copy Tables and partitions.

Different Ways for Google BigQuery Copy Tables

There are various ways for performing Google BigQuery Copy Tables are listed below:

  • Use the Google Cloud Console
  • Use the bq command-line tool
  • Try out jobs.insert API method
  • Use client libraries

Google BigQuery Copy Tables: Single Source

Utilize the Cloud Console, bq command-line tool, jobs.insert API method or client libraries to copy a single table in BigQuery. Keep in mind, if you are using the cloud console to run a copy job, then the platform supports only one source and destination table. It is best to use bq command-line tool or the API method to Google BigQuery Copy Table multiple sources to the destination.

Follow these methods to copy a single source table:

Using the BigQuery Console

  1. Go to the explorer panel. Zoom in the project and dataset. Now, select the table
  2. From the details panel section, press on the Copy table.
  3. A copy table dialog box will appear. Under the Destination tab:
  • For Project Name, select the project responsible for storing the copied table.
  • For Dataset name, choose the dataset location where you wish to store these copied tables. Remember, the location for the source and destination dataset must be similar.
  • For Table name, type a new table name. Make sure it is unique and not overwriting the existing table in the destination dataset.
  1. Now, select the copy button, to begin with, the Google BigQuery Copy Table job.

Using the Bq Command

Run the bq cp command and use optional flags to control the actions of existing destination tables.

  • In the destination dataset, the command -a or –append_table helps add information from the source to an existing table.
  • The command -f or –force helps overwrite the destination dataset’s existing table. Also, it never directs for confirmation.
  • If a table exists in the destination dataset, the -n or –no_clobber command helps return the error message: Table ‘project_id:dataset.table‘ already exists, skipping. And if -n is not stated, it requests you to replace the destination table.
  • For encrypting the destination table, –destination_kms_key command is used. It is a customer-managed Cloud KMS key best for encryption.

If you have your source or destination project at another location, create a project ID to the dataset names using the format: project_id:dataset.

bq --location=location cp 
-a -f -n 
project_id:dataset.source_table 
project_id:dataset.destination_table

Edit and replace with the following:

  • location: The –location flag is optional that states the name of the location. For instance, you are residing in Tokyo and using BigQuery, set your flag value to asia-northeast1. For default value use .bigqueryrc file.
  • project_id implies the ID of your project.
  • dataset states the source or destination dataset names.
  • source_table is for the copied table.
  • destination_table specifies destination dataset table name.

Using APIs

Use bigquery.jobs.insert method or call API to Google BigQuery Copy Table an existing table in the destination dataset. Visit the jobReference section of the job resource to specify your location in the location property.

"copy": {
      "sourceTable": {       // Required
        "projectId": string, // Required
        "datasetId": string, // Required
        "tableId": string    // Required
      },
      "destinationTable": {  // Required
        "projectId": string, // Required
        "datasetId": string, // Required
        "tableId": string    // Required
      },
      "createDisposition": string,  // Optional
      "writeDisposition": string,   // Optional
    },

Here,

  • sourceTable specifies the table to be copied.
  • destinationTable implies the table to which the copied data will be shared.
  • createDisposition specifies if there is a need to create a table if it doesn’t exist.
  • writeDisposition implies whether you want to add or overwrite the previous tables in the destination dataset.

Google BigQuery Copy Tables: Multiple Source

Utilize the bq command-line tool, jobs.insert API method or client libraries to Google BigQuery Copy Table from multiple sources tables. Keep in mind, if you are using any of these methods to run a copy job, then the platform supports only one destination table. Also, make sure each source table should have identical schemas and is specified as a comma-separated list. One cannot use wildcards in this case.

Follow these methods to Google BigQuery Copy Table multiple source tables:

Using the BigQuery Console

Google Cloud Console currently does not support the copying of multiple tables in Google BigQuery.

Using the Bq Command

Run the bq cp command and add multiple source tables in a comma-separated list. Further, to control the actions of existing destination tables optional flags must be used.

  • Add data from the source to an existing table in the destination table using command -a or –append_table
  • The command -f or –force never prompts for confirmation. It helps to overwrite data in the existing tables of the destination dataset.
  • If a table exists in the destination dataset, the -n or –no_clobber command helps return the error message: Table project_id:dataset.table‘ already exists, skipping. And if -n is not stated, it requests you to replace the destination table.
  • For encrypting the destination table a customer-managed Cloud KMS key is used by calling –destination_kms_key command

If you have your source or destination project at another location, create a project ID to the dataset names using the format: project_id:dataset.

bq --location=location cp 
-a -f -n 
project_id:dataset.source_table,project_id:dataset.source_table 
project_id:dataset.destination_table

Edit and replace with the following:

  • location: The –location flag is optional that states the name of the location. For instance, you are residing in Tokyo and using BigQuery, set your flag value to asia-northeast1. For default value use .bigqueryrc file.
  • project_id implies the ID of your project.
  • dataset states the source or destination dataset names.
  • source_table is for the copied table.
  • destination_table specifies destination dataset table name.

Using APIs

Use jobs.insert method, configure a Google BigQuery Copy Table job, and clearly state the sourceTables property when using API to copy data from multiple tables in BigQuery. Visit the jobReference section of the Google BigQuery Copy Table job resource to specify your region in the location property.

Limitations of Google BigQuery Copy Tables

There are certain limitations to the Google BigQuery Copy Table, listed below:

  • The user is not permitted to overwrite the previous table in the destination dataset when using the Cloud console for the Google BigQuery Copy Table job. Also, it must hold a unique identity in the destination dataset.
  • Both the destination dataset and source dataset must reside at the same location at the time of copying tables. For instance, you do not have access to copy a table from an EU-based dataset and write it to another dataset (US-based).
  • The minute you copy the table, it demands the destination table name must follow a similar naming convention at the time of the creation of a BigQuery Table.
  • Cloud Console does not support the copying of multiple source tables into a destination table.
  • If utilizing API or bq command-line tool to copy multiple source tables, make sure each has identical schemas.
  • As the underlying storage is managed dynamically in BigQuery, the time to perform Google BigQuery Copy Table jobs may vary across different runs.

Conclusion

In this post, you learnt about Google BigQuery, its key features, and how to copy data from single or multiple source tables. Google BigQuery is one of the most cost-effective Data Warehouse solutions that help can be highly advantageous for Analytics and consumer insights circles. One can import, process, export the data using this platform. Additionally, it helps in creating reports and making better decisions for your business.

Visit our Website to Explore Hevo

Companies store valuable data from multiple data sources into Google BigQuery. The manual process to transfer data from source to destination is a tedious task. Hevo Data is a No-code Data Pipeline that can help you transfer data from any data source to desired Google BigQuery. It fully automates the process to load and transform data from 100+ sources to a destination of your choice without writing a single line of code. 

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of learning about the Google BigQuery Copy Table in the comments section below!

No-code Data Pipeline For your Google BigQuery