BigQuery to Redshift: Data Migration Guide Simplified

on Amazon Redshift, Data Integration, Data Migration, Data Warehouses, Google BigQuery • June 6th, 2022 • Write for Hevo

BigQuery to Redshift | Cover | Hevo Data

Once your organization has figured out the use case to build the product upon, the next step is to configure the cloud data warehouse. Because, choosing the right cloud-based data warehouse is the underlying criteria for any enterprise to support its analysis, reporting, and BI functions.

That said, cloud-based data warehouses like Redshift and Bigquery have become industry-standard in almost every facet of the business intelligence process. Their architecture performs complex analytical queries faster, leveraging innovative data storage, processing, and compiling techniques.

And, when sometimes, there exist workflow challenges that could hamper productivity, the need for diversification also comes into existence. Hence, this tutorial article is for the folks who would want to migrate data from BigQuery to Redshift. Keep reading to know-how.

Table of Contents

  1. What is Google BigQuery?
  2. What is Amazon Redshift?
  3. BigQuery vs Redshift: Use Case Comparison
  4. How to migrate data from BigQuery to Redshift?
  5. Conclusion

What is Google BigQuery?

Google BigQuery, a giant in the cloud data warehousing industry, helps enterprises of all sizes manage and analyze data seamlessly. BigQuery can be termed as an all-in-one solution for all your business needs — here’s why. Using BigQuery’s built-in features to manage data for machine learning, geospatial analysis, and business intelligence, enterprises can leverage structured information to build upon previous use cases. Moreover, it’s fast, too. Google claims it can query terabytes in seconds and petabytes in minutes.

BigQuery’s unique infrastructure helps in resource management which, in turn, helps data folks focus on their company’s data. The services correspondingly are designed to cater to today’s data technology needs. So, in short, Google BigQuery has three parts to it: BigQuery Storage, BigQuery Analytics, and BigQuery Administration.

You can more about Google BigQuery, its working, and the premium features in this article: Google BigQuery: The Definitive Guide: Data Warehousing, Analytics, and Machine Learning at Scale

What is Amazon Redshift?

A cleverly named product, Amazon’s Redshift signifies a shift from the “Big Red,” better known as the Oracle. Redshift is a petabyte-scale data warehouse solution built and designed for data scientists, data analysts, data administrators, and software developers. Its parallel processing and compression algorithms allow users to perform operations on billions of rows, reducing command execution time significantly. Redshift is perfect for analyzing large quantities of data with today’s business intelligence tools in multiple data warehouses.

BigQuery vs Redshift: Use Case Comparison

Here’s a brief overview of how you can leverage BigQuery and Redshift.

Redshift is best for round-the-clock computational needs — like NASDAQ daily reporting, automated ad-bidding, and live dashboards. Redshift’s users pay for an hourly rate — depending on the instance type and the number of nodes deployed. For detailed pricing information, read here.

BigQuery is best for sporadic workloads — occasionally idle but sometimes running millions of queries. Best for e-commerce applications, Ad-hoc reporting, and gaining insights on consumer behavior using Machine learning. BigQuery provides its users with two pricing models, Query-based pricing and Storage pricing. For detailed pricing information, read here.

Simplify Data Migration From BigQuery to Redshift Using Hevo’s No-Code Data Pipeline! (Explore these methods)

Google BigQuery is a cloud data warehouse known for ingesting data instantaneously and perform almost real-time analysis. On the other hand, Amazon Redshift is a fully managed, reliable data warehouse service in the cloud that offers large-scale storage and analysis of data set and performs large-scale database migrations.

You can ingest data from your Google BigQuery database using Hevo Pipelines and replicate it to Amazon Redshift.

Method 1: Migrate Data From BigQuery to Redshift (Using Hevo Data)

Hevo Data, a Fully-managed Data Pipeline platform, can help you automate, simplify & enrich your data replication process in a few clicks. With Hevo’s wide variety of connectors and blazing-fast Data Pipelines, you can extract & load data from 100+ Data Sources such as BigQuery straight into your Data Warehouse such as Redshift or any Databases. To further streamline and prepare your data for analysis, you can process and enrich raw granular data using Hevo’s robust & built-in Transformation Layer without writing a single line of code!

GET STARTED WITH HEVO FOR FREE

Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial today to experience an entirely automated hassle-free Data Replication!

Method 2: Migrate Data From BigQuery to Redshift (Manual)

This method would be time consuming and somewhat tedious to implement. Users will have to write custom codes to enable two processes, streaming data from Kafka and ingesting data into BigQuery. This method is suitable for users with a technical background.

How to migrate data from BigQuery to Redshift?

Method 1: Migrate Data From BigQuery to Redshift (Using Hevo Data)

Hevo Data is a No-code Data Pipeline platform that automates the direct transfer of data from 100+ data sources (40+ free sources) to Amazon Redshift and other Data Warehouses, BI tools, or any other desired destination. Hevo completely automates the process of not just importing data from your chosen source but also enriching and transforming it into an analysis-ready format, all without requiring you to write a single line of code. Because of its fault-tolerant architecture, data is handled safely and consistently, with no data loss.

Hevo Data covers all of your data preparation requirements, allowing you to concentrate on core business activities and better know how to generate more leads, sustain client retention lifecycles, and expand your company to new heights of profitability. It offers a consistent and stable solution for real-time data management, guaranteeing that analysis-ready data is always available at your selected location.

The steps to import data from BigQuery to Redshift using Hevo Data are as follows:

Step 1: Connect your Google BigQuery account to Hevo.
Step 2: Select Amazon Redshift as your destination and begin data transfer.

That’s it…

Method 2: Migrate Data From BigQuery to Redshift (Manual)

The figure below depicts how AWS Glue links to Google BigQuery for data intake. This section will provide a brief about the approach we will follow to migrate data from BigQuery to Redshift.

BigQuery to Redshift | how AWS Glue links to Google BigQuery for data intake.
Source: Amazon Document

AWS Glue is a fully managed extract, transform, and load (ETL) service that simplifies data preparation and loading for analytics. AWS Glue delivers all of the tools required for data integration and analysis in minutes rather than weeks or months. AWS Glue custom connections are a new feature in AWS Glue and AWS Glue Studio that allows you to easily move data from SaaS apps and bespoke data sources to your Amazon S3 data lake.

You can search and pick connectors from the AWS Marketplace with a few clicks and start your data preparation routine in minutes. You can also create and distribute bespoke connections across teams, as well as incorporate open source Spark connectors and Athena federated query connectors into your data preparation operations. The AWS Glue Connector for Google BigQuery enables cross-cloud data migration from Google BigQuery to Amazon Simple Storage Service (Amazon S3). AWS Glue Studio is a new graphical interface for creating, running, and monitoring extract, transform, and load (ETL) operations in AWS Glue. You can create data transformation processes graphically and run them on AWS Glue’s Apache Spark-based serverless ETL engine in real-time.

Method 1 to Migrate Data From BigQuery to Redshift (Manual)

Step 1: Configure Google Account

  1. Download the JSON file containing the service account credentials from Google Cloud.
  2. Select Store a new secret from the Secrets Manager panel.
  3. Select Other sort of secret for Secret type.
  4. As for credentials, enter your key and the value as a base64-encoded string.
  5. The remainder of the options should be left alone.
  6. Select Next.
BigQuery to Redshift | Configure Google Account
Source: Amazon Document
  1. Give the secret bigquery credentials a name.
  2. Complete the remaining steps to save the secret.

Step 2: Adding an IAM role to AWS Glue

The next step is to create an IAM role with the AWS Glue job’s required permissions. Attach the AWS managed policies listed below to the role:

Create and connect a policy that allows Secrets Manager to read the secret and S3 bucket write access.

As part of this tutorial, the following sample policy displays the AWS Glue task. Always scope down policies before utilizing them in a production setting. Provide your secret ARN for the bigquery credentials secret you established previously, as well as the S3 bucket where you want to save BigQuery data:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "GetDescribeSecret",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetResourcePolicy",
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret",
                "secretsmanager:ListSecretVersionIds"
            ],
            "Resource": "arn:aws:secretsmanager::<<account_id>>:secret:<<your_secret_id>>"
        },
        {
            "Sid": "S3Policy",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket",
                "s3:GetBucketAcl",
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::<<your_s3_bucket>>",
                "arn:aws:s3:::<<your_s3_bucket>>/*"
            ]
        }
    ]
}

Step 3: Subscribing to the BigQuery Glue Connector

To subscribe to the connection, take these steps:

  1. Go to AWS Marketplace and look for the AWS Glue Connector for Google BigQuery.
  2. Select Continue to Subscribe.
BigQuery to Redshift | Subscribing to the BigQuery Glue Connector
Source: Amazon Document
  1. Examine the terms and conditions, price, and other information.
  2. Continue to Configuration is the option.
  3. Select your delivery method under Delivery Method.
  4. Select your software version under Software Version.
  5. Select Continue to Launch.
BigQuery to Redshift | Subscribing to the BigQuery Glue Connector
Source: Amazon Document
  1. Select Activate the Glue connector in AWS Glue Studio under Usage instructions.
BigQuery to Redshift | Subscribing to the BigQuery Glue Connector
Source: Amazon Document
  1. Enter a name for your connection in the Name field (for example, bigquery).
BigQuery to Redshift | Subscribing to the BigQuery Glue Connector
Source: Amazon Document
  1. Select a VPC, subnet, and security group if desired.
  2. Choose bigquery credentials for AWS Secret.
  3. Select Create connection.

Step 3: Creating the ETL job in AWS Glue Studio

  1. Select Jobs in Glue Studio.
  2. Select BigQuery as the source.
  3. Select S3 as the target.
  4. Select Create.
BigQuery to Redshift | Creating the ETL job in AWS Glue Studio
Source: Amazon Document
  1. Select ApplyMapping and then erase it.
  2. Select BigQuery.
  3. Choose BigQuery for Connection.
  4. Extend your connection possibilities.
  5. Select the Add new option.
BigQuery to Redshift | Creating the ETL job in AWS Glue Studio
Source: Amazon Document
  1. Add the following Key/Value.
    1. Key: parentProject, Value: <<google_project_id>>
    2. Key: table, Value: bigquery-public-data.covid19_open_data.covid19_open_data
BigQuery to Redshift | Creating the ETL job in AWS Glue Studio
Source: Amazon Document
  1. Select the S3 bucket.
  2. Format and compression types can be selected.
  3. S3 Target Location must be specified.
BigQuery to Redshift | Creating the ETL job in AWS Glue Studio
Source: Amazon Document
  1. Select Job Specifics.
  2. Enter BigQuery S3 as the name.
  3. Choose the role you established for IAM Role.
  4. Select Spark as the Type.
  5. Choose Glue 2.0 — Supports Spark 2.4, Scala 2, and Python3.
  6. Leave the rest of the choices alone.
  7. Select Save.
BigQuery to Redshift | Creating the ETL job in AWS Glue Studio
Source: Amazon Document
  1. To run the job, choose the Run Job button.
BigQuery to Redshift | Creating the ETL job in AWS Glue Studio
Source: Amazon Document
  1. Once the job run succeeds, check the S3 bucket for data.
BigQuery to Redshift | Creating the ETL job in AWS Glue Studio
Source: Amazon Document

We utilize the connection in this task to read data from the Big Query public dataset for COVID-19. More details may be found on GitHub at Apache Spark SQL connector for Google BigQuery (Beta).

Step 4: Querying the Data

You may now crawl the data in an S3 bucket using the Glue Crawlers. It will generate a table called covid. You may now query this data in Athena. The screenshot below depicts the results of our query.

BigQuery to Redshift | Querying the Data
Source: Amazon Document

Conclusion

In this article, we waded through some basics of data warehouse and successfully discussed methods to migrate data from BigQuery to Redshift. We used two ways to obtain our desired results:

In the first method, we discussed in detail the manual way to migrate data. This approach requires users to have a sound understanding of Redshift and BigQuery, and their migration customs — leaving the door open for a new user to make mistakes.

In the second method, we used Hevo Data to achieve our desired results. Through Hevo, the BiqQuery to Redshift Data Migration process was much faster, fully automated, and required no code. Hevo also provides a pre-built Native REST API Connector that will allow you to integrate data from a plethora of custom and non-native sources — without writing a single line of cost.

Visit our Website to Explore Hevo

Businesses can use automated platforms like Hevo Data to set this integration and handle the ETL process. It helps you directly transfer data from a source of your choice to a Data Warehouse, Business Intelligence tools, or any other desired destination in a fully automated and secure manner without having to write any code and will provide you with a hassle-free experience.

Moreover, Hevo offers a fully-managed solution to set up data integration from 100+ other data sources(including 30+ free data sources) and will let you directly load data to the destination of your choice. It will automate your data flow in minutes without writing any line of code.

Hevo Product Video

Not sure about purchasing a plan? Sign Up for a 14-day full feature access trial and simplify your Data Ingestion & Integration process. You can also check out our unbeatable pricing and decide the best plan for your needs. 

Let us know what you think about the ways discussed to migrate data from BigQuery to Redshift in the comments section below, and if you have anything to add, please do so.

No-Code Data Pipeline for Your Redshift Data Warehouse