This article presents two methods, with a step-by-step approach, to help you migrate data from Amazon Redshift to Google BigQuery. You can leverage both ways (one at a time) to achieve the desired results. Let’s begin…
Table of Contents
Data warehouses are the single source of data truth. They are subject-oriented, integrated, time-variant, non-volatile, and summarized. To truly embrace the idea of digital transformation, having a data warehouse is a must. Amazon Redshift and Google BigQuery are the world’s two most popular data warehouses, both are built on the foundation to scale and enable enterprises to leverage and picture the economies of tomorrow.
Data warehouses are a critical component of the modern data stack that satisfies an organization’s general data and analytics business needs. Choosing the right cloud-based data warehouse is the underlying criteria for any enterprise to support its analysis, reporting, and BI functions. Data warehouses deliver enhanced business intelligence through on-demand, scalable computing infrastructure that improves consistency and provides a competitive advantage in terms of speed and enablement.
Both Amazon Redshift and Google BigQuery are used widely across a company’s business functions — from BI to trends forecasting and budgeting. Both are fully managed petabyte-scale cloud data warehouses and have strengths and weaknesses, but the right balance of needs and expectations will provide the best results.
What is Amazon Redshift?
Amazon Redshift is built and designed for today’s data scientists, data analysts, data administrators, and software developers. It gives users the ability to perform operations on billions of rows, making Redshift perfect for analyzing large quantities of data. Its architecture is based on an extensive communication channel between the client application and the data warehouse cluster. To learn more about Amazon Redshift infrastructure, see Redshift documentation.
Important Amazon Redshift Features:
- AWS’s Integrated Analytics Ecosystem: AWS offers a wide range of in-build ecosystem services, making it easier to handle end-to-end analytics workflows without compliance and operational roadblocks. Some of the famous examples include AWS Lake Formation, AWS Glue, AWS EMR, AWS DMS, AWS Schema Conversion Tool, and many more.
- Redshift ML: A must for today’s data professionals, Redshift ML allows users to create and train Amazon SageMaker models through data from Redshift for predictive analytics.
- Machine Learning For Maximum Performance: Amazon Redshift offers advanced ML capabilities which deliver high throughput and performance. Its advanced algorithms predict incoming queries based on certain factions to help prioritize critical workloads.
What is Google BigQuery?
Google BigQuery is a fully managed, serverless data warehouse. Similar to Amazon Redshift, it allows users to run analysis over petabytes of data in real-time. It’s cost-effective and only requires users to understand and write standard SQL.
Important Google BigQuery Features:
- BigQuery Omni: With a consistent data experience, the ability to break data silos to gain crucial insights, and agility, BQ Omni allows users to execute queries through foreign cloud platforms as well, which come in handy when analyzing across clouds such as AWS and Microsoft Azure.
- BigQuery ML: Using simple SQL queries, users in BigQuery execute ML models. Some of the widely used models are as follows: Linear regression, Binary and Multiclass Logistic Regression, Matrix Factorization, Time Series, and Deep Neural Network models.
- BigQuery Data Transfer Service: It automates the data movement into BigQuery, which comes in handy when multiple data sources, including data warehouses, are involved.
Hevo Data is a no-code data pipeline platform that helps new-age businesses integrate their data from multiple source systems to a data warehouse and plug this unified data into any BI tool or Business Application. The platform provides 100+ ready-to-use integrations with a range of data sources and is trusted by thousands of data-driven organizations from 30+ countries.Visit our Website to Explore Hevo
Check out some of Hevo’s interesting features:
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ sources that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Amazon Redshift to Google BigQuery Migration
Method 1: Amazon Redshift to Google BigQuery Migration Using the Console
Before you begin with the migration process from Amazon Redshift to Google BigQuery, a few sets of requirements and permissions are required to be objectified.
For Google Cloud, you first have to meet the prerequisites and receive permissions. And, to gain access to the Amazon Redshift cluster, you will need the AWS access key pair — to use in a later process. To obtain the AWS access key pair, follow the steps provided in the Redshift’s documentation.
Before setting up the migration process, it is also a prerequisite that you obtain the JDBC URL, the username, and password of your Amazon Redshift database, and the URI of the Amazon S3 bucket, too.
The required permissions for Google BigQuery are as follows:
- Permissions to create the transfer: bigquery.transfers.update
- Permissions to the target dataset: bigquery.datasets.get and bigquery.datasets.update
Note: All two permissions fall under the umbrella category of the bigquery.admin, predefined in the IAM (Identity and Access Management) role. To learn more about IAM roles, see the Access control reference section of the BigQuery Data Transfer Service guide. To ensure you have the required configurations and permissions to enable transfer, have a look at AWS managed policy.
Moving forward, it is also important that you comply with the predefined requirements stated by Google Cloud. Go along with the points shown below:
- To store your migration data, choose or create a Google Cloud project: Go to project selector then choose between ‘SELECT PROJECT’ or ‘CREATE PROJECT.’
- Enable the BigQuery Data Transfer Service API in the Google Cloud Console by clicking on the ‘Enable’ button (A green checkmark will indicate that the API is enabled).
- To store data, create a BigQuery dataset.
Note: For the Amazon Redshift cluster, you have to allowlist the IP addresses which will then correspond to your data set’s location. The list for all the IP addresses is provided in the ‘Grant access to your Amazon Redshift cluster’ section of the Google Cloud’s documentation. After this step, you have to gain access to the Amazon S3 bucket, by creating an AWS access key, as discussed above.
Let’s start with the migration process…
Step 1: Open the BigQuery page on the Google Cloud Console
Step 2: On the left, under the Analysis section, click on ‘Data transfers’
Step 3: Click on ‘CREATE A TRANSFER’
Step 4: Select ‘Source’ as ‘Amazon S3,’ and enter the migration name in the ‘Transfer config name’ box. There you will also see the ‘Schedule options.’ Now, you have to choose between the ‘Start now’ and the ‘Start at set time’ radio buttons
Step 5: Enter the data set ID in the ‘Destination settings’ box, and continue the process by filling in the ‘Data source details.’ It should look like this:
Step 6: This step is optional. You can enable notifications such that you can receive email notifications if the transfer run fails. Next, click on ‘Save’ to continue
Step 7: After successful execution, the Cloud Console will display all the transfer setup details, Resource name is included, too
Method 2: Procedure to Enable Amazon Redshift to Google BigQuery Migration Using Hevo
Hevo provides Google BigQuery as a Destination for loading/transferring data from any Source system, which also includes Amazon Redshift. You can refer to Hevo’s documentation for Permissions, User Authentication, and Prerequisites for Google BigQuery as a destination here.
Step 1: Setup Amazon Redshift as a Source by following these four steps:
- On the left-hand side of the Hevo UI, Asset Palette is present, click on ‘PIPELINES’
- Under Pipelines List View, click on ‘+CREATE’
- Select Amazon Redshift as the source on the ‘Select Source Type’ page
- ‘Configure your Amazon Redshift Source’ page will appear, it should look like this:
Step 2: Setup Google BigQuery as a Destination by following these five steps:
- On the left-hand side of the Hevo UI, Asset Palette is present. Click on ‘Destinations’
- Under ‘Destination List View’ click on ‘+CREATE’
- Select Google BigQuery as the Destination type in the ‘Add Destination page’
- Select between the two authentication methods presented on the ‘Configure your Google BigQuey Account’ page
- After choosing one, and giving the required permissions, continue with the ‘Configure your Google BigQuery Warehouse’ page and specify the given details. It should look like this:
In this article, we waded through some basics of data warehouse and successfully discussed methods to migrate data from Amazon Redshift to Google BigQuery. We used two ways to obtain our desired results:
In the first method, we discussed in detail the manual way to migrate data. This approach requires users to have a sound understanding of Redshift and BigQuery, and their migration customs — leaving the door open for a new user to make mistakes.
In the second method, we used Hevo Data to achieve our desired results. Through Hevo, the process was much faster, fully automated, and required no code. Hevo provides a pre-built Native REST API Connector that will allow you to integrate data from a plethora of custom and non-native sources — without writing a single line of cost.
Experience excellence first hand. Sign Up here for a 14-day free trial of Hevo’s feature-rich suite. Also, check our pricing. It will help you choose the right plan for your business needs.