In this day businesses are generating a huge amount of data regularly. To make important decisions this raw data is very essential. However, there are a few major challenges in the process. It is very difficult to analyze such a huge amount of data (Petabyte) using a traditional database like MySQL, Oracle, SQL Server, etc. In order to get any tangible insight from this data, you would need to move data to Data Warehouse like Google BigQuery. This post provides a step-by-step walkthrough on how to migrate data from Amazon Aurora to the BigQuery Data warehouse using 2 steps. Read along and decide which method suits you the best!

Table of Contents

Performing ETL from Amazon Aurora to BigQuery

Method 1: Using Custom Code to Move Data from Aurora to BigQuery

This method consists of a 5-step process to move data from Amazon Aurora to BigQuery through custom ETL Scripts. There are various advantages of using this method but a few limitations as well.

Method 2: Using Hevo Data to Move Data from Aurora to BigQuery

Hevo Data can load your data from Aurora to BigQuery in minutes without writing a single line of code and for free. Data loading can be configured on a visual, point, and click interface. Since Hevo is fully managed, you would not have to invest any additional time and resource in maintaining and monitoring the data. Hevo promises 100% data consistency and accuracy.

Sign up here for a 14-day Free Trial!

Methods to Connect Aurora to BigQuery

Here are the methods you can use to connect Aurora to BigQuery in a seamless fashion:

In this post, we will cover the second method (Custom Code) in detail. Towards the end of the post, you can also find a quick comparison of both data replication methods so that you can evaluate your requirements and choose wisely.

Method 1: Using Custom Code to Move Data from Aurora to BigQuery

This method requires you to manually set up the data transfer process from Aurora to BigQuery. The steps involved in migrating data from Aurora DB to BigQuery are as follows:

Step 1: Getting Data out of Amazon Aurora

By writing SQL queries we can export data from Aurora. The SELECT queries enable us to pull the data we want. You can specify filters and order of the data. You can also limit results.

A command-line tool called mysqldump lets you export entire tables and databases in a format you specify (i.e. delimited text, CSV, or SQL queries).

mysql -u user_name -p --database=db_name --host=rds_hostname --port=rdsport --batch -e "select * from table_name" | sed 's/t/","/g;s/^/"/;s/$/"/;s/n//g' > file_name

Step 2: Preparing Amazon Aurora Data

You need to make sure the target BigQuery table is perfectly aligned with the source Aurora table, specifically column sequence and data type of columns.

Step 3: Upload Data to Google Cloud Storage

You can use the bq command-line tool to upload the files to your datasets, adding schema and data type information. In GCP quickstart guide you can find the syntax of bq command line. Iterate through this process as many times as it takes to load all of your tables into BigQuery.

Once the data has been extracted from the Aurora database the next step is to upload it to the GCS There are multiple ways this can be achieved. The various methods are explained below.

(A) Using Gsutil

The gsutil utility will help us upload a local file to GCS(Google Cloud Storage) bucket.

To copy a file to GCS:

gsutil cp local_copy.csv  gs://gcs_bucket_name/path/to/folder/

To copy an entire folder to GCS:

gsutil local_dir_name -r dir gs://gcs_bucket_name/path/to/parent_folder/

(B) Using Web console

An alternative means to upload the data from your local machine to GCS is using the web console. To use the web console alternative, follow the steps laid out below:

1. First of all, you need to Login to your GCP account. You ought to have a working Google account to make use of GCP. In the menu option, click on storage and navigate to the browser on the left tab

Aurora to BigQuery: Storage
Image Source: Self

2. Create a new bucket to upload your data. Make sure the name you choose is globally unique

Aurora to BigQuery: New Bucket
Image Source: Self

3. Click on the bucket name that you have created in step 2, this will ask to you browse the file from your local machine

Aurora to BigQuery: Bucket Details
Image Source: Self

4. Choose the file and click on the upload button. Once you see a progress bar wait for the action to be completed. You can see the file is loaded in the bucket.

Step 4: Upload to BigQuery from GCS

You can upload data to BigQuery from GCS using two methods:  (A) Using console UI (B) Using the command line

(A) Uploading the data using the web console UI:

1. Go to the BigQuery from the menu option

Aurora to BigQuery: BigQuery Editor
Image Source: Self

2. On UI click on create a dataset, provide dataset name and location

3. Then click on the name of created dataset name. Click on create table option and provide the dataset name, table name, project name, table type.

(B) Using data using the command line

Aurora to BigQuery: Editor Functions
Image Source: Self

To open the command-line tool, on the GCS home page click on the cloud shell icon shown below:

The Syntax of the bq command line to load the file in the BigQuery table:

bq --location=[LOCATION] load --source_format=[FORMAT] [DATASET].[TABLE]
[PATH_TO_SOURCE] [SCHEMA]

[LOCATION] is an optional parameter that represents Location name like “us-east”

[FORMAT] to load CSV file set it to CSV

[DATASET] dataset name.

[TABLE] table name to load the data.

[PATH_TO_SOURCE] path to source file present on the GCS bucket.

[SCHEMA] Specify the schema

Note: Autodetect flag recognizes the table schema

You can specify your schema using bq command line:

bq --location=US load --source_format=CSV your_dataset.your_table gs://your_bucket/your_data.csv ./your_schema.json

Your target table schema can also be autodetected:

bq --location=US load --autodetect --source_format=CSV your_dataset.your_table  gs://mybucket/data.cs

BigQuery command line interface offers us to 3 options to write to an existing table.

  1. Overwrite the table
    bq --location = US load  --autodetect  --replace  --source_file_format = CSV your_target_dataset_name.your_target_table_name gs://source_bucket_name/path/to/file/source_file_name.csv
  2. Append data to the table
    bq --location = US load  --autodetect  --noreplace  --source_file_format = CSV your_target_dataset_name.your_table_table_name gs://source_bucket_name/path/to/file/source_file_name.csv  ./schema_file.json
  3. Adding new fields in the target table
    bq --location = US load  --noreplace  --schema_update_option = ALLOW_FIELD_ADDITION  --source_file_format = CSV your_target_dataset.your_target_table gs://bucket_name/source_data.csv ./target_schema.json 

Step 5: Update the Target Table in BigQuery

The data that was matched in the above-mentioned steps have not done complete data updates on the target table. The data is stored in an intermediate data table, this is because GCS is a staging area for BigQuery upload. Hence, the data is stored in an intermediate table before been uploaded to BigQuery

There two ways of updating the final table as explained below:

  1. Update the rows in the final table, Then insert new rows from the intermediate table
    UPDATE target_table  t SET t.value = s.value FROM  intermediate_table  s WHERE t.id = s.id; INSERT target_table (id, value) SELECT id, value FROM  intermediate_table  WHERE NOT id IN (SELECT id FROM target_table);
  2. Delete all the rows from the final table which are in the intermediate table, Then insert all the rows newly loaded in the intermediate table. Here the intermediate table will be in truncate and load mode
    DELETE FROM final_table f WHERE f.id IN (SELECT id from intermediate_table); INSERT data_setname.target_table(id, value) SELECT id, value  FROM data_set_name.intermediate_table;

That’s it! Your Amazon Aurora to Google BigQuery data transfer process is complete.

Limitations of using Custom Code to Move Data from Aurora to BigQuery

The manual approach will allow you to move your data from Amazon Aurora to BigQuery successfully, however it suffers from the following limitations:

  • Writing custom code would benefit only if you are looking for one-time data migration from Amazon Aurora to BigQuery.
  • When you have a use case where data needs to be migrated on an ongoing basis or in real-time, you would have to move it in an incremental manner. The above custom code ETL would fail here. You would need to write additional lines of code to achieve this real-time data migration.
  • There are chances that the custom code breaks if the source schema gets changed.
  • If in future you identify the data transformations needs to be applied on data, you would need extra time and resources.
  • Since you have developed this custom code to migrate data you have to maintain the standard of the code to achieve the business goals.
  • In the custom code approach, You have to focus on both business and technical details.
  • ETL code is fragile with a high susceptibility to break the entire process that may cause inaccurate and delay in data availability in BigQuery.

Method 2: Using Hevo Data to Move Data from Aurora to BigQuery

Aurora to BigQuery- Hevo logo
Image Source

Using a fully managed, easy-to-use Data Pipeline platform like Hevo, you can load your data from Aurora to BigQuery in a matter of minutes. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Get Started with Hevo for free

This can be achieved in a code-free, point-and-click visual interface. Here are simple steps to replicate Amazon Aurora to BigQuery using Hevo:

  • Step 1: Connect to your Aurora DB by providing the proper credentials.
  • Step 2: Select one of the following the replication mode:
    1. Full dump (load all tables)
    2. Load data from Custom SQL Query
    3. Fetch data using BinLog
  • Step 3: Complete Aurora to BigQuery Migration by providing information about your Google BigQuery destination such as the authorized Email Address, Project ID, etc.
Aurora to BigQuery- Destination Config
Image Source

About Amazon Aurora

Aurora to BigQuery: Amazon Aurora Logo
Image Source

Amazon Aurora is a popular relational database developed by Amazon. It is one of the most widely used Databases for low latency data storage and data processing. This Database operates on Cloud technology and is easily compatible with MySQL and PostgreSQL. This way it provides performance and accessibility similar to traditional databases at a relatively low price. Moreover, it is simple to use and it has Amazon security and reliability features.

Amazon Aurora is a MySQL-compatible relational database used by businesses. Aurora offers better performance and cost-effective price than traditional MySQL. It is primarily used for a transactional or operational database. It is specifically not recommended for analytics.

About Google BigQuery

Aurora to BigQuery: Google BigQuery Logo
Image Source

BigQuery is a Google-managed cloud-based data warehouse service. This is intended to store, process and analyze large volume (Petabytes) of data to make data analysis more accurate. BigQuery is known to give quick results with very minimal cost and great performance. Since infrastructure is managed by Google, you as a developer, data analyst or data scientist can focus on uncovering meaningful insights using native SQL.

Conclusion

This blog talks about the two methods you can implement to move data from Aurora to BigQuery in a seamless fashion.

Visit our Website to Explore Hevo

With Hevo, you can achieve simple and efficient Data Replication from Aurora to BigQuery. Hevo can help you move data from not just Aurora DB but 100s of additional data sources.

Sign Up for a 14-Day Free Trial with Hevo and experience a seamless, hassle-free data loading experience from Aurora DB to Google BigQuery. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your understanding of the Amazon Aurora BigQuery Integration in the comments below!

mm
Freelance Technical Content Writer, Hevo Data

Lahudas focuses on solving data practitioners' problems through content tailored to the data industry by using his problem-solving ability and passion for learning about data science.

No-code Data Pipeline for BigQuery

Get Started with Hevo