Backed by Oracle, MySQL is an open-source relational database management system based on SQL. It is easy to use, fast, robust and free. Some of these qualities have made it one of the most adopted databases to build applications in the industry.

What is Google BigQuery?

BigQuery is an enterprise data warehouse solution from Google Cloud Platform (GCP). BigQuery is unique among other data warehouse solutions for the following reasons: n n Serverless n Pay for what you consumed n Support to ingest streaming data n Support for nested data n Data security n

How to replicate MySQL to BigQuery?

You can move data from MySQL to BigQuery by following these steps: n n Extract data from MySQL n Clean and Transform n Upload to Google Cloud Storage(GCS) n Upload to the BigQuery table from GCS using bq tool or Console or any cloud SDK n

MySQL to BigQuery: 2 Easy Methods

With the BigQuery MySQL Connector, users can perform data analysis on MySQL data stored in BigQuery without the need for complex data migration processes. With MySQL BigQuery integration, organizations can leverage the scalability and power of BigQuery for handling large datasets stored in MySQL.

Table of Contents

Migrate MySQL to BigQuery can be a complex undertaking, necessitating thorough testing and validation to minimize downtime and ensure a smooth transition. This blog will provide 2 easy methods to connect MySQL to BigQuery in real time. The first method uses Hevo’s automated Data Pipeline to set up this connection while the second method involves writing custom ETL Scripts to perform this data transfer from MySQL to BigQuery. Read along and decide which method suits you the best!

Methods to Connect MySQL to BigQuery

Following are the 2 methods using which you can set up your MySQL to BigQuery integration:

Method 1: Using Hevo Data to Connect MySQL to BigQuery
Method 2: Manual ETL Process to Connect MySQL to BigQuery

Method 1: Using Hevo Data to Connect MySQL to BigQuery

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Get Started with Hevo for Free

With a ready-to-use Data Integration Platform, Hevo, you can easily move data from MySQL to BigQuery with just 2 simple steps. This does not need you to write any code and will provide you with an error-free, fully managed setup to move data in minutes.

Step 1: Connect and configure your MySQL database.

Click PIPELINES in the Navigation Bar.
Click + CREATE in the Pipelines List View.
In the Select Source Type page, select the MySQL as your source.
In the Configure your MySQL Source page, specify the connection settings for your MySQL Source.

MySQL to BigQuery: Configuring MySQL as source — Configuring MySQL as source

Step 2: Choose BigQuery as your Destination

Click DESTINATIONS in the Navigation Bar.
Click + CREATE in the Destinations List View.
In Add Destination page select Google BigQuery as the Destination type.
In the Configure your Google BigQuery Warehouse page, specify the following details:

MySQL to BigQuery: Configuring BigQuery as Destination — Configuring BigQuery as Destination

It is that simple. While you relax, Hevo will fetch the data and send it to your destination Warehouse.

Instead of building a lot of these custom connections, ourselves, Hevo Data has been really flexible in helping us meet them where they are.
– Josh Kennedy, Head of Data and Business Systems

In addition to this, Hevo lets you bring data from a wide array of sources – Cloud Apps, Databases, SDKs, and more. You can check out the complete list of available integrations.

Method 2: Manual ETL Process to Connect MySQL to BigQuery

The manual method of connecting MySQL to BigQuery involves writing custom ETL scripts to set up this data transfer process. This method can be implemented in 2 different forms:

Full Dump and Load
Incremental Dump and Load

1. Full Dump and Load

This approach is relatively simple, where complete data from the source MySQL table is extracted and migrated to BigQuery. If the target table already exists, drop it and create a new table ( Or delete complete data and insert newly extracted data).

Full Dump and Load is the only option for the first-time load even if the incremental load approach is used for recurring loads. The full load approach can be followed for relatively small tables even for further recurring loads. You can also check out MySQL to Redshift integration.

MySQL to BigQuery Using Command Line Tool

The high-level steps to be followed to replicate MySQL to BigQuery are:

Step 1: Extract Data from MySQL
Step 2: Clean and Transform the Data
Step 3: Upload to Google Cloud Storage(GCS)
Step 4: Upload to the BigQuery Table from GCS

Let’s take a detailed look at each step to migrate sqlite to mariadb.

Step 1: Extract Data from MySQL

There are 2 popular ways to extract data from MySQL – using mysqldump and using SQL query.

Extract data using mysqldump

Mysqldump is a client utility coming with Mysql installation. It is mainly used to create a logical backup of a database or table. Here, is how it can be used to extract one table:

mysqldump -u <db_username> -h <db_host> -p db_name table_name > table_name.sql

Here output file table_name.sql will be in the form of insert statements like

INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);

This output has to be converted into a CSV file. You have to write a small script to perform this. Here is a well-accepted python library doing the same – mysqldump_to_csv.py

Alternatively, you can create a CSV file using the below command. However, this option works only when mysqldump is run on the same machine as the mysqld server which is not the case normally.

mysqldump -u [username] -p -t -T/path/to/directory [database] --fields-terminated-by=,

Extract Data using SQL query

MySQL client utility can be used to run SQL commands and redirect output to file.

mysql -B -u user database_name -h mysql_host   -e "select * from table_name;" > 
table_name_data_raw.txt

Further, it can be piped with text editing utilities like sed or awk to clean and format data.

Example:

mysql -B -u user database_name -h mysql_host -e "select * from table_name;" |   
sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g"  > table_name_data.csv

Step 2: Clean and Transform the Data

Apart from transforming data for business logic, there are some basic things to keep in mind:

BigQuery expects CSV data to be UTF-8 encoded.
BigQuery does not enforce Primary Key and unique key constraints. ETL process has to take care of that.
Column types are slightly different. Most of the types have either equivalent or convertible types. Here is a list of common data types.

MySQL Data Type	BigQuery Data Type
TINYINT	INT64
SMALLINT	INT64
MEDIUMINT	INT64
INT	INT64
BIGINT	INT64
DECIMAL	NUMERIC
FLOAT	FLOAT64
DOUBLE	FLOAT64
BIT	BOOL
CHAR	STRING
VARCHAR	STRING
BINARY	BYTES
VARBINARY	BYTES
TINYTEXT	STRING
TEXT	STRING
MEDIUMTEXT	STRING
LONGTEXT	STRING
ENUM	No type for ENUM.Must use any type which can represent values in ENUM
SET	No type for SET.Must use any type which can represent values in SET
DATE	DATE
TIME	TIME
DATETIME	DATETIME
TIMESTAMP	TIMESTAMP

Fortunately, the default date format in MySQL is the same, YYYY-MM-DD. Hence, while taking mysqldump there is no need to do any specific changes for this. If you are using a string field to store date and want to convert to date while moving to BigQuery you can use STR_TO_DATE function.
DATE value must be dash(-) separated and in the form YYYY-MM-DD (year-month-day). You can visit their official page to know more about BigQuery data types.

Syntax : 
STR_TO_DATE(str,format)
Example : 
SELECT STR_TO_DATE('31,12,1999','%d,%m,%Y');
Result : 
1999-12-31

The hh:mm: ss (hour-minute-second) portion of the timestamp must use a colon (:) separator.
Make sure text columns are quoted if it can potentially have delimiter characters.

Step 3: Upload to Google Cloud Storage(GCS)

Gsutil is a command-line tool for manipulating objects in GCS. It can be used to upload files from different locations to your GCS bucket.

To copy a file to GCS:

gsutil cp table_name_data.csv  gs://my-bucket/path/to/folder/

To copy an entire folder:

gsutil cp -r dir gs://my-bucket/path/to/parent/

If the files are present in S3, the same command can be used to transfer to GCS.

gsutil cp -R s3://bucketname/source/path gs://bucketname/destination/path

Storage Transfer Service

Storage Transfer Service from Google cloud is another option to upload files to GCS from S3 or other online data sources like HTTP/HTTPS location. Destination or sink is always a Cloud Storage bucket. It can also be used to transfer data from one GCS bucket to another.

This service is extremely handy when comes to data movement to GCS with support for:

Schedule one-time or recurring data transfer.
Delete existing objects in the destination if no corresponding source object is present.
Deletion of source object after transferring.
Periodic synchronization between source and sink with advanced filters based on file creation dates, file name, etc.

Upload from Web Console

If you are uploading from your local machine, web console UI can also be used to upload files to GCS. Here are the steps to upload a file to GCS with screenshots.

Login to your GCP account. In the left bar, click Storage and go to Browser.

MySQL to BigQuery: Login to Google Cloud Platform — Log in to Google Cloud Platform

2. Select the GCS bucket you want to upload the file. Here the bucket we are using is test-data-hevo. Click on the bucket.

MySQL to BigQuery: Select the GCS Bucket — Select the GCS Bucket

3. On the bucket details page below, click the upload files button and select file from your system.

MySQL to BigQuery: Upload File from Computer — Upload File from Computer

4. Wait till the upload is completed. Now, the uploaded file will be listed in the bucket:

MySQL to BigQuery: Uploaded File Listed in Bucket — Uploaded File Listed in Bucket

Step 4: Upload to the BigQuery Table from GCS

You can use the bq command to interact with BigQuery. It is extremely convenient to upload data to the table from GCS.Use the bq load command, and specify CSV as the source_format.

The general syntax of bq load:

bq --location=[LOCATION] load --source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE] [SCHEMA]

[LOCATION] is your location. This is optional.
[FORMAT] is CSV.
[DATASET] is an existing dataset.
[TABLE] is the name of the table into which you’re loading data.
[PATH_TO_SOURCE] is a fully-qualified Cloud Storage URI.
[SCHEMA] is a valid schema. The schema can be a local JSON file or inline.– autodetect flag also can be used instead of supplying a schema definition.

There are a bunch of options specific to CSV data load :

CSV option	Description	bq command flag
Field delimiter	Field delimiter: Comma, Tab, Pipe, Other	-F or –field_delimiter
Header rows	Header rows to skip	–skip_leading_rows
Number of bad records allowed	Number of errors allowed	–max_bad_records
Newline characters	Allow quoted newlines	–allow_quoted_newlines
Custom null values	Specifies a string that represents a null value in a CSV file	–null_marker
Trailing optional columns	Allow jagged rows	–allow_jagged_rows
Unknown values	Ignore unknown values	–ignore_unknown_values
Quote	The value that is used to quote data sections in a CSV file	–quote
Encoding	The character encoding of the data	-E or –encoding

To see full list options visit Bigquery documentation on loading data cloud storage CSV, visit here.

Following are some example commands to load data:

Specify schema using a JSON file:

bq --location=US load --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv ./myschema.json

If you want schema auto-detected from the file:

bq --location=US load --autodetect --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv

If you are writing to the existing table, BigQuery provides three options – Write if empty, Append to the table, Overwrite table. Also, it is possible to add new fields to the table while uploading data. Let us see each with an example.

To overwrite the existing table:

bq --location=US load --autodetect --replace --source_format=CSV mydataset.mytable 
gs://mybucket/mydata.csv

To append to an existing table:

bq --location=US load --autodetect --noreplace --source_format=CSV mydataset.mytable 
gs://mybucket/mydata.csv ./myschema.json

To add a new field to the table. Here new schema file with an extra field is given :

bq --location=asia-northeast1 load --noreplace --schema_update_option=ALLOW_FIELD_ADDITION 
--source_format=CSV mydataset.mytable gs://mybucket/mydata.csv ./myschema.json

2. Incremental Dump and Load

MySQL to BigQuery: Incremental Dump and Load Image — Image Source

In certain use cases, loading data once from MySQL to BigQuery will not be enough. There might be use cases where once initial data is extracted from the source, we need to keep the target table in sync with the source. For a small table doing a full data dump every time might be feasible but if the volume data is higher, we should think of a delta approach.

The following steps are used in the Incremental approach to connect MySQL to Bigquery:

Step 1: Extract Data from MySQL
Step 2: Update Target Table in BigQuery

Step 1: Extract Data from MySQL

For incremental data extraction from MySQL use SQL with proper predicates and write output to file. mysqldump cannot be used here as it always extracts full data.

Eg: Extracting rows based on the updated_timestamp column and converting to CSV.

mysql -B -u user database_name -h mysql_host  -e "select * from table_name where
updated_timestamp < now() and  updated_timestamp >'#max_updated_ts_in_last_run#'"| 
sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g"  > table_name_data.csv

Note: In case of any hard delete happened in the source table, it will not be reflected in the target table.

Step 2: Update Target Table in BigQuery

First, upload the data into a staging table to upsert newly extracted data to the BigQuery table. This will be a full load. Please refer full data load section above. Let’s call it delta_table. Now there are two approaches to load data to the final table:

Update the values existing records in the final table and insert new rows from the delta table which are not in the final table.

UPDATE data_set.final_table t
SET t.value = s.value
FROM  data_set.delta_table s
WHERE t.id = s.id;

INSERT data_set.final_table (id, value)
SELECT id, value
FROM  data_set.delta_table
WHERE NOT id IN (SELECT id FROM data_set.final_table);

2. Delete rows from the final table which are present in the delta table. Then insert all rows from the delta table to the final table.

DELETE data_set.final_table f
WHERE f.id IN (SELECT id from data_set.delta_table);

INSERT data_set.final_table (id, value)
SELECT id, value
FROM  data_set.delta_table;

Disadvantages of Manually Loading Data

Manually loading data from MySQL to BigQuery presents several drawbacks:

Cumbersome Process: While custom code suits one-time data movements, frequent updates become burdensome manually, leading to inefficiency and bulkiness.
Data Consistency Issues: BigQuery lacks guaranteed data consistency for external sources, potentially causing unexpected behavior during query execution amidst data changes.
Location Constraint: The data set’s location must align with the Cloud Storage Bucket’s region or multi-region, restricting flexibility in data storage.
Limitation with CSV Format: CSV files cannot accommodate nested or repeated data due to format constraints, limiting data representation possibilities.
File Compression Limitation: Mixing compressed and uncompressed files in the same load job using CSV format is not feasible, adding complexity to data loading tasks.
File Size Restriction: The maximum size for a gzip file in CSV format is capped at 4 GB, potentially limiting the handling of large datasets efficiently.

What Can Be Migrated From MySQL To BigQuery?

Since the 1980s, MySQL has been the most widely used open-source relational database management system (RDBMS), with businesses of all kinds using it today.

MySQL is fundamentally a relational database. It is renowned for its dependability and speedy performance and is used to arrange and query data in systems of rows and columns.

Both MySQL and BigQuery use tables to store their data. When you migrate a table from MySQL to BigQuery, it is stored as a standard, or managed, table.

Both MySQL and BigQuery employ SQL, but they accept distinct data types, therefore you’ll need to convert MySQL data types to BigQuery equivalents. Depending on the data pipeline you utilize, there are several options for dealing with this.

Once in BigQuery, the table is encrypted and kept in Google’s warehouse. Users may execute complicated queries or accomplish any BigQuery-enabled job.

The Advantages of Connecting MySQL To BigQuery

BigQuery is intended for efficient and speedy analytics, and it does so without compromising operational workloads, which you will most likely continue to manage in MySQL.
It improves workflows and establishes a single source of truth. Switching between platforms can be difficult and time-consuming for analysts. Updating BigQuery with MySQL ensures that both data storage systems are aligned around the same source of truth and that other platforms, whether operational or analytical, are constantly bringing in the right data.
BigQuery increases data security. By replicating data from MySQL to BigQuery, customers avoid the requirement to provide rights to other data engineers on operational systems.
BigQuery handles Online Analytical Processing (OLAP), whereas MySQL is designed for Online Transaction Processing (OLTP). Because it is a cost-effective, serverless, and multi-cloud data warehouse, BigQuery can deliver deeper data insights and aid in the conversion of large data into useful insights.

Conclusion

The article listed 2 methods to set up your BigQuery MySQL integration. The first method relies on Hevo’s automated Data Pipeline to transfer data, while the second method requires you to write custom scripts to perform ETL processes from MySQL to BigQuery.

Complex analytics on data requires moving data to Data Warehouses like BigQuery. It takes multiple steps to extract data, clean it and upload it. It requires real effort to ensure there is no data loss at each stage of the process, whether it happens due to data anomalies or type mismatches.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. Check out Hevo pricing to choose the best plan for your organization.

Share your understanding of connecting MySQL to BigQuery in the comments section below!

Faisal K K Freelance Technical Content Writer, Hevo Data

Faisal loves data science and combines his problem-solving ability and passion for writing to help data teams in solving complex business problems.

No-code Data Pipeline for BigQuery

Try for free

Replicating data from MySQL to BigQuery: 2 Easy Methods

Methods to Connect MySQL to BigQuery

Method 1: Using Hevo Data to Connect MySQL to BigQuery

Step 1: Connect and configure your MySQL database.

Step 2: Choose BigQuery as your Destination

Method 2: Manual ETL Process to Connect MySQL to BigQuery

1. Full Dump and Load

Step 1: Extract Data from MySQL

Step 2: Clean and Transform the Data

Step 3: Upload to Google Cloud Storage(GCS)

Storage Transfer Service

Upload from Web Console

Step 4: Upload to the BigQuery Table from GCS

2. Incremental Dump and Load

Step 1: Extract Data from MySQL

Step 2: Update Target Table in BigQuery

Disadvantages of Manually Loading Data

What Can Be Migrated From MySQL To BigQuery?

The Advantages of Connecting MySQL To BigQuery

Conclusion

No-code Data Pipeline for BigQuery

Continue Reading