With the BigQuery MySQL Connector, users can perform data analysis on MySQL data stored in BigQuery without the need for complex data migration processes. With MySQL BigQuery integration, organizations can leverage the scalability and power of BigQuery for handling large datasets stored in MySQL.

Migrate MySQL to BigQuery can be a complex undertaking, necessitating thorough testing and validation to minimize downtime and ensure a smooth transition. This blog will provide 2 easy methods to connect MySQL to BigQuery in real time. The first method uses Hevo’s automated Data Pipeline to set up this connection while the second method involves writing custom ETL Scripts to perform this data transfer from MySQL to BigQuery. Read along and decide which method suits you the best!

Methods to Connect MySQL to BigQuery

Following are the 2 methods using which you can set up your MySQL to BigQuery integration:

Method 1: Using Hevo Data to Connect MySQL to BigQuery

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Get Started with Hevo for Free

With a ready-to-use Data Integration Platform, Hevo, you can easily move data from MySQL to BigQuery with just 2 simple steps. This does not need you to write any code and will provide you with an error-free, fully managed setup to move data in minutes.

Step 1: Connect and configure your MySQL database.

  • Click PIPELINES in the Navigation Bar.
  • Click + CREATE in the Pipelines List View.
  • In the Select Source Type page, select the MySQL as your source.
  • In the Configure your MySQL Source page, specify the connection settings for your MySQL Source.
MySQL to BigQuery: Configuring MySQL as source
Configuring MySQL as source

Step 2: Choose BigQuery as your Destination

  • Click DESTINATIONS in the Navigation Bar.
  • Click + CREATE in the Destinations List View.
  • In Add Destination page select Google BigQuery as the Destination type.
  • In the Configure your Google BigQuery Warehouse page, specify the following details:
MySQL to BigQuery: Configuring BigQuery as Destination
Configuring BigQuery as Destination

It is that simple. While you relax, Hevo will fetch the data and send it to your destination Warehouse.

Instead of building a lot of these custom connections, ourselves, Hevo Data has been really flexible in helping us meet them where they are.

– Josh Kennedy, Head of Data and Business Systems

In addition to this, Hevo lets you bring data from a wide array of sources – Cloud Apps, Databases, SDKs, and more. You can check out the complete list of available integrations.

SIGN UP HERE FOR A 14-DAY FREE TRIAL

Method 2: Manual ETL Process to Connect MySQL to BigQuery

The manual method of connecting MySQL to BigQuery involves writing custom ETL scripts to set up this data transfer process. This method can be implemented in 2 different forms:

1. Full Dump and Load

Full Dump and Load

This approach is relatively simple, where complete data from the source MySQL table is extracted and migrated to BigQuery. If the target table already exists, drop it and create a new table ( Or delete complete data and insert newly extracted data).

Full Dump and Load is the only option for the first-time load even if the incremental load approach is used for recurring loads. The full load approach can be followed for relatively small tables even for further recurring loads. You can also check out MySQL to Redshift integration.

MySQL to BigQuery Using Command Line Tool

The high-level steps to be followed to replicate MySQL to BigQuery are:

Let’s take a detailed look at each step to migrate sqlite to mariadb.

Step 1: Extract Data from MySQL

There are 2 popular ways to extract data from MySQL – using mysqldump and using SQL query.

  • Extract data using mysqldump

Mysqldump is a client utility coming with Mysql installation. It is mainly used to create a logical backup of a database or table. Here, is how it can be used to extract one table:

mysqldump -u <db_username> -h <db_host> -p db_name table_name > table_name.sql

Here output file table_name.sql will be in the form of insert statements like

INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);

This output has to be converted into a CSV file. You have to write a small script to perform this. Here is a well-accepted python library doing the same – mysqldump_to_csv.py

Alternatively, you can create a CSV file using the below command. However, this option works only when mysqldump is run on the same machine as the mysqld server which is not the case normally.

mysqldump -u [username] -p -t -T/path/to/directory [database] --fields-terminated-by=,
  • Extract Data using SQL query

MySQL client utility can be used to run SQL commands and redirect output to file.

mysql -B -u user database_name -h mysql_host   -e "select * from table_name;" > 
table_name_data_raw.txt

Further, it can be piped with text editing utilities like sed or awk to clean and format data.

Example:

mysql -B -u user database_name -h mysql_host -e "select * from table_name;" |   
sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g"  > table_name_data.csv

Step 2: Clean and Transform the Data

Apart from transforming data for business logic, there are some basic things to keep in mind:

  • BigQuery expects CSV data to be UTF-8 encoded.
  • BigQuery does not enforce Primary Key and unique key constraints. ETL process has to take care of that.
  • Column types are slightly different. Most of the types have either equivalent or convertible types. Here is a list of common data types.
MySQL Data TypeBigQuery Data Type
TINYINTINT64
SMALLINTINT64
MEDIUMINTINT64
INTINT64
BIGINTINT64
DECIMALNUMERIC
FLOATFLOAT64
DOUBLEFLOAT64
BITBOOL
CHARSTRING
VARCHARSTRING
BINARYBYTES
VARBINARYBYTES
TINYTEXTSTRING
TEXTSTRING
MEDIUMTEXTSTRING
LONGTEXTSTRING
ENUMNo type for ENUM.Must use any type which can represent values in ENUM
SETNo type for SET.Must use any type which can represent values in SET
DATEDATE
TIMETIME
DATETIMEDATETIME
TIMESTAMPTIMESTAMP

Fortunately, the default date format in MySQL is the same, YYYY-MM-DD. Hence, while taking mysqldump there is no need to do any specific changes for this. If you are using a string field to store date and want to convert to date while moving to BigQuery you can use STR_TO_DATE function.
DATE value must be dash(-) separated and in the form YYYY-MM-DD (year-month-day). You can visit their official page to know more about BigQuery data types.

Syntax : 
STR_TO_DATE(str,format)
Example : 
SELECT STR_TO_DATE('31,12,1999','%d,%m,%Y');
Result : 
1999-12-31
  • The hh:mm: ss (hour-minute-second) portion of the timestamp must use a colon (:) separator.
  • Make sure text columns are quoted if it can potentially have delimiter characters.

Step 3: Upload to Google Cloud Storage(GCS)

Gsutil is a command-line tool for manipulating objects in GCS. It can be used to upload files from different locations to your GCS bucket.

To copy a file to GCS:

gsutil cp table_name_data.csv  gs://my-bucket/path/to/folder/

To copy an entire folder:

gsutil cp -r dir gs://my-bucket/path/to/parent/

If the files are present in S3, the same command can be used to transfer to GCS.

gsutil cp -R s3://bucketname/source/path gs://bucketname/destination/path

Storage Transfer Service

Storage Transfer Service from Google cloud is another option to upload files to GCS from S3 or other online data sources like HTTP/HTTPS location. Destination or sink is always a Cloud Storage bucket. It can also be used to transfer data from one GCS bucket to another.

This service is extremely handy when comes to data movement to GCS with support for:

  • Schedule one-time or recurring data transfer.
  • Delete existing objects in the destination if no corresponding source object is present.
  • Deletion of source object after transferring.
  • Periodic synchronization between source and sink with advanced filters based on file creation dates, file name, etc.

Upload from Web Console

If you are uploading from your local machine, web console UI can also be used to upload files to GCS. Here are the steps to upload a file to GCS with screenshots.

  1. Login to your GCP account. In the left bar, click Storage and go to Browser.
MySQL to BigQuery: Login to Google Cloud Platform
Log in to Google Cloud Platform

2. Select the GCS bucket you want to upload the file. Here the bucket we are using is test-data-hevo. Click on the bucket.

MySQL to BigQuery: Select the GCS Bucket
Select the GCS Bucket

3. On the bucket details page below, click the upload files button and select file from your system.

MySQL to BigQuery: Upload File from Computer
Upload File from Computer

4. Wait till the upload is completed. Now, the uploaded file will be listed in the bucket:

MySQL to BigQuery: Uploaded File Listed in Bucket
Uploaded File Listed in Bucket

Step 4: Upload to the BigQuery Table from GCS

You can use the bq command to interact with BigQuery. It is extremely convenient to upload data to the table from GCS.Use the bq load command, and specify CSV as the source_format.

The general syntax of bq load:

bq --location=[LOCATION] load --source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE] [SCHEMA]

[LOCATION] is your location. This is optional.
[FORMAT] is CSV.
[DATASET] is an existing dataset.
[TABLE] is the name of the table into which you’re loading data.
[PATH_TO_SOURCE] is a fully-qualified Cloud Storage URI.
[SCHEMA] is a valid schema. The schema can be a local JSON file or inline.– autodetect flag also can be used instead of supplying a schema definition.

There are a bunch of options specific to CSV data load :

CSV optionDescriptionbq command flag
Field delimiterField delimiter: Comma, Tab, Pipe, Other-F or –field_delimiter
Header rowsHeader rows to skip–skip_leading_rows
Number of bad records allowedNumber of errors allowed–max_bad_records
Newline charactersAllow quoted newlines–allow_quoted_newlines
Custom null valuesSpecifies a string that represents a null value in a CSV file–null_marker
Trailing optional columnsAllow jagged rows–allow_jagged_rows
Unknown valuesIgnore unknown values–ignore_unknown_values
QuoteThe value that is used to quote data sections in a CSV file–quote
EncodingThe character encoding of the data-E or –encoding

To see full list options visit Bigquery documentation on loading data cloud storage CSV, visit here.

Following are some example commands to load data:

  • Specify schema using a JSON file:
bq --location=US load --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv ./myschema.json
  • If you want schema auto-detected from the file:
bq --location=US load --autodetect --source_format=CSV mydataset.mytable gs://mybucket/mydata.csv

If you are writing to the existing table, BigQuery provides three options – Write if empty, Append to the table, Overwrite table. Also, it is possible to add new fields to the table while uploading data. Let us see each with an example.

  • To overwrite the existing table:
bq --location=US load --autodetect --replace --source_format=CSV mydataset.mytable 
gs://mybucket/mydata.csv
  • To append to an existing table:
bq --location=US load --autodetect --noreplace --source_format=CSV mydataset.mytable 
gs://mybucket/mydata.csv ./myschema.json
  • To add a new field to the table. Here new schema file with an extra field is given :
bq --location=asia-northeast1 load --noreplace --schema_update_option=ALLOW_FIELD_ADDITION 
--source_format=CSV mydataset.mytable gs://mybucket/mydata.csv ./myschema.json

2. Incremental Dump and Load

MySQL to BigQuery: Incremental Dump and Load Image
Image Source

In certain use cases, loading data once from MySQL to BigQuery will not be enough. There might be use cases where once initial data is extracted from the source, we need to keep the target table in sync with the source. For a small table doing a full data dump every time might be feasible but if the volume data is higher, we should think of a delta approach.

The following steps are used in the Incremental approach to connect MySQL to Bigquery:

Step 1: Extract Data from MySQL

For incremental data extraction from MySQL use SQL with proper predicates and write output to file. mysqldump cannot be used here as it always extracts full data.

Eg: Extracting rows based on the updated_timestamp column and converting to CSV.

mysql -B -u user database_name -h mysql_host  -e "select * from table_name where
updated_timestamp < now() and  updated_timestamp >'#max_updated_ts_in_last_run#'"| 
sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g"  > table_name_data.csv

Note: In case of any hard delete happened in the source table, it will not be reflected in the target table.

Step 2: Update Target Table in BigQuery

First, upload the data into a staging table to upsert newly extracted data to the BigQuery table. This will be a full load. Please refer full data load section above. Let’s call it delta_table. Now there are two approaches to load data to the final table:

  1. Update the values existing records in the final table and insert new rows from the delta table which are not in the final table.
UPDATE data_set.final_table t
SET t.value = s.value
FROM  data_set.delta_table s
WHERE t.id = s.id;
INSERT data_set.final_table (id, value)
SELECT id, value
FROM  data_set.delta_table
WHERE NOT id IN (SELECT id FROM data_set.final_table);

2. Delete rows from the final table which are present in the delta table. Then insert all rows from the delta table to the final table.

DELETE data_set.final_table f
WHERE f.id IN (SELECT id from data_set.delta_table);
INSERT data_set.final_table (id, value)
SELECT id, value
FROM  data_set.delta_table;

Disadvantages of Manually Loading Data

Manually loading data from MySQL to BigQuery presents several drawbacks:

  1. Cumbersome Process: While custom code suits one-time data movements, frequent updates become burdensome manually, leading to inefficiency and bulkiness.
  2. Data Consistency Issues: BigQuery lacks guaranteed data consistency for external sources, potentially causing unexpected behavior during query execution amidst data changes.
  3. Location Constraint: The data set’s location must align with the Cloud Storage Bucket’s region or multi-region, restricting flexibility in data storage.
  4. Limitation with CSV Format: CSV files cannot accommodate nested or repeated data due to format constraints, limiting data representation possibilities.
  5. File Compression Limitation: Mixing compressed and uncompressed files in the same load job using CSV format is not feasible, adding complexity to data loading tasks.
  6. File Size Restriction: The maximum size for a gzip file in CSV format is capped at 4 GB, potentially limiting the handling of large datasets efficiently.

What Can Be Migrated From MySQL To BigQuery?

Since the 1980s, MySQL has been the most widely used open-source relational database management system (RDBMS), with businesses of all kinds using it today.

MySQL is fundamentally a relational database. It is renowned for its dependability and speedy performance and is used to arrange and query data in systems of rows and columns. 

MySQL to BigQuery
Image Source

Both MySQL and BigQuery use tables to store their data. When you migrate a table from MySQL to BigQuery, it is stored as a standard, or managed, table.

Both MySQL and BigQuery employ SQL, but they accept distinct data types, therefore you’ll need to convert MySQL data types to BigQuery equivalents. Depending on the data pipeline you utilize, there are several options for dealing with this.

Once in BigQuery, the table is encrypted and kept in Google’s warehouse. Users may execute complicated queries or accomplish any BigQuery-enabled job.

The Advantages of Connecting MySQL To BigQuery

MySQL to BigQuery
  • BigQuery is intended for efficient and speedy analytics, and it does so without compromising operational workloads, which you will most likely continue to manage in MySQL.
  • It improves workflows and establishes a single source of truth. Switching between platforms can be difficult and time-consuming for analysts. Updating BigQuery with MySQL ensures that both data storage systems are aligned around the same source of truth and that other platforms, whether operational or analytical, are constantly bringing in the right data.
  • BigQuery increases data security. By replicating data from MySQL to BigQuery, customers avoid the requirement to provide rights to other data engineers on operational systems.
  • BigQuery handles Online Analytical Processing (OLAP), whereas MySQL is designed for Online Transaction Processing (OLTP). Because it is a cost-effective, serverless, and multi-cloud data warehouse, BigQuery can deliver deeper data insights and aid in the conversion of large data into useful insights.

Conclusion

The article listed 2 methods to set up your BigQuery MySQL integration. The first method relies on Hevo’s automated Data Pipeline to transfer data, while the second method requires you to write custom scripts to perform ETL processes from MySQL to BigQuery.

Complex analytics on data requires moving data to Data Warehouses like BigQuery. It takes multiple steps to extract data, clean it and upload it. It requires real effort to ensure there is no data loss at each stage of the process, whether it happens due to data anomalies or type mismatches.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. Check out Hevo pricing to choose the best plan for your organization.

Share your understanding of connecting MySQL to BigQuery in the comments section below!

mm
Freelance Technical Content Writer, Hevo Data

Faisal loves data science and combines his problem-solving ability and passion for writing to help data teams in solving complex business problems.

No-code Data Pipeline for BigQuery