Are you looking to connect Cloud SQL to BigQuery? We have you covered. Moving data to a data warehouse like BigQuery helps you store large amounts of data for better analysis. BigQuery is a powerful cloud-based data warehouse and analytics platform that provides a cost-effective solution to store, process, and analyze large amounts of data.

With the help of BigQuery, companies can reduce costs and gain insights faster. In this article, we will discuss the advantages of Cloud SQL to BigQuery replication, the process of migration, and useful tips to help you make the move.

What is Cloud SQL?

Cloud SQL is a fully-managed database service using which you can create, maintain and manage your databases on the Google Cloud Platform(GCP). It can be used with MySQL, PostgreSQL, or SQL Server; hosted on the GCP. 

You can use Cloud SQL with App Engine applications that are written in many languages like Java, Python, PHP, Node.js, Go, Ruby, etc. It’s popular because of its support for multiple database engines and programming languages.

What is BigQuery?

BigQuery is Google’s cloud enterprise data warehouse that primarily serves business agility in running complex SQL queries and performing analysis on huge datasets efficiently. It is based on Google technology called Dremel, using columnar storage and tree architecture to support high-speed scanning of data for querying efficiency.

BigQuery is serverless, highly scalable, and enlists Google’s infrastructure in the cloud for management. It uses access patterns to maintain efficiency in storage by dynamically optimizing data structures for the handling of data in a changing environment for optimal performance.

Use Cases of Migrating Data from Cloud SQL to BigQuery

I have highlighted some key use cases of migrating data from Cloud SQL to BigQuery.

  • Advanced Analytics: BigQuery offers advanced analytics; you can use it for big data analysis and running complex queries.
  • Scalability: Scale with growing data volume using BigQuery’s serverless architecture.
  • Cost Effectiveness: Optimize costs through BigQuery’s pay-as-you-go model for large datasets.
  • Real-Time Data Processing: Enable real-time data processing and querying for insights and reporting in real-time.
  • Advanced Machine Learning: Combine this with BigQuery ML to construct and deploy machine learning models directly in your data warehouse.
  • Unified Data Environment: Integrate data from Cloud SQL and other sources into BigQuery. It can act as one single go-to platform for all your analysis and reporting.

How to Load Data from Cloud SQL to BigQuery? 

In this section, we will discuss the different methods to import data from cloud SQL to Bigquery. There can be primarily 4 strategies to move data from Cloud SQL to BigQuery via CloudSQL :

Ways to Load Data from CloudSQL to BigQuery

Method 1: The Simpler and Efficient Option- Hevo
With Hevo, you can effortlessly load your CloudSQL data into BigQuery in minutes. All you need to do is configure the source and destination by giving credentials to access your data. Streamline your data management by choosing Hevo.

Method 2: Write Code to Directly Transfer from Source Database to BigQuery
Writing code to directly transfer data from a source database to BigQuery allows for custom data handling and transformation, but it can be complex and time-consuming to develop and maintain.

Method 3: Using Federated Queries to Move Data from Cloud SQL to BigQuery
Using federated queries allows BigQuery to query data directly in Cloud SQL without moving it, enabling seamless integration and analysis. However, this method can suffer from performance issues and latency due to querying data across different systems in real time.

Method 4: Cloud SQL to BigQuery through Cloud Pub/Sub
Transferring data from Cloud SQL to BigQuery via Cloud Pub/Sub involves streaming data in real time, which is useful for continuous updates and low-latency processing. However, this method can be complex to set up and may incur higher costs.

Method 5: Using GCP to Create a DataPipeline
Using Google Cloud Platform (GCP) to create a data pipeline automates the transfer of data from Cloud SQL to BigQuery, facilitating efficient and scalable data integration. However, this method may require significant initial setup and configuration.

Method 6: Using Cloud Dataflow to Create a Data Pipeline
Using Cloud Dataflow to create a data pipeline allows for scalable and flexible data processing from Cloud SQL to BigQuery with built-in data transformation capabilities. However, it can be complex to configure and manage and may involve higher costs due to resource consumption and processing requirements.

Load your Data into BigQuery for free

Method 1: The Simpler and Efficient Option- Hevo

Prerequisites: Ensure your Google Cloud MySQL instance is active, MySQL version is 5.5 or higher, and BinLog replication is enabled. Have your database credentials ready.

  • User Setup: Confirm the database user has SELECT and REPLICATION privileges. If needed, create a new user with these privileges.
  • Hevo Setup: Log in to your Hevo account and whitelist Hevo’s IP on your Google Cloud MySQL instance.
  • Configure Pipeline: In Hevo, create a new Pipeline with a unique name, select ingestion mode, and select CloudMySQL as the source. Enter the specified details and complete the source setup.
  • Test & Proceed: Test the connection in Hevo, then proceed to set up the Destination.

Step 1: Configure Cloud SQL as a Source 

Cloud SQL to BigQuery: Configure Source

Step 2: Configure BigQuery as a Destination

Here are the steps to setup BigQuery as the Destination. 

  • In the Hevo dashboard, access the destinations option by clicking on the DESTINATIONS button located in the Navigation Bar.
  • Next, create a new destination by selecting the + CREATE button in the Destinations List View.
  • In the Add Destination page, choose Google BigQuery as the Destination type.
  • Once you’ve selected Google BigQuery, you will be redirected to the Configure your Google BigQuery Warehouse page, where you can set up your warehouse according to your preferences.
Cloud SQL to BigQuery: Configure Destination
  • Learn how to set up CloudSQL as a Source here.
  • Learn how to setup BigQuery as a Destination here.

Key features of Hevo Data Platform to move data from Cloud SQL to BigQuery:

  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and maps it to the destination schema.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.

Method 2: Write Code to Directly Transfer from Source Database to BigQuery

Let’s take MySQL for Cloud SQL, as an example.

Your MySQL instance will be adding data continuously, hence you will need to schedule your Bigquery transfers regularly, at some pre-decided interval.  Also, you will have to use a marker or a flag, or a bookmark, to always point at the last record inserted in the previous iteration of the load job. 

Your current iteration will now begin from the record next to the marker. This marker can be a timestamp (stored for every record, tells you the records till which the insertion has already been done) OR a unique key. 

With every new iteration/load job, you will start from the marker and transfer all records till the latest added record, to BigQuery. 

Another option here can be to first export your source data into CSV format and allow BigQuery to import your CSVs. 

Method 3: Using Federated Queries to Move Data from Cloud SQL to BigQuery

BigQuery supports Cloud SQL federated queries which let you query your Cloud SQL database directly from BigQuery. BigQuery will run this federated SQL on its own, and pick up records from your Cloud SQL database. 

Using the new SQL function EXTERNAL_QUERY(), BigQuery can query data residing in Cloud SQL in real-time, without copying or moving data. 

Its syntax is as follows:

SELECT * FROM EXTERNAL_QUERY(Your_Source_Database_Connection_ID, external_database_query); 

E.g. If you have an “employee_salary” table in BigQuery and an “employee”/”salary” table in Cloud MySQL.

INSERT 
   YourBigQueryDatabaseName.employee_salary 
SELECT
   *
FROM
   EXTERNAL_QUERY(
      "CloudSQL_Database_connection_ID",
      "SELECT employee_Name, last_salary_drawn FROM employee e, salary s    WHERE e.employee_ID = s.employee_ID ");

EXTERNAL_QUERY executes the query in Cloud SQL and returns results as a temporary table where the source data types are automatically converted to BigQuery data types

To view all tables in a database,

SELECT
*
FROM
EXTERNAL_QUERY
(
"connection_id"
,


"select * from information_schema.tables;"
);

To View the table’s columns,

SELECT
*
FROM
EXTERNAL_QUERY
(
"connection_id"
,


"select * from information_schema.columns where table_name='x';"
);

The downside to this Cloud SQL to BigQuery method is that you have to code and update your queries often, as your data on both sides grows. 

Integrate SQL Server on Google Cloud SQL to BigQuery
Integrate SQL Server on Google Cloud SQL to BigQuery
Integrate PostgreSQL on Google Cloud SQL to BigQuery

Method 4: Cloud SQL to BigQuery through Cloud Pub/Sub

You can split the writing process to pick data from CloudSQL and store it in Cloud Pub/Sub, temporarily, and then have a Dataflow reader to stream into BigQuery. 

First, create a connector/routine that publishes changes from Cloud SQL to Pub/Sub. Then write a Dataflow pipeline that consumes those changes from PubSub, and pushes them to BigQuery. 

If you have relational data at the source which must be denormalized/aggregated before the transfer, this approach works well. 

Also, if you have legacy relational tables whose data types are much different from BigQuery and would need a lot of pre-processing, you can consider this option. 

The upside is that you can reduce overall latency to just a few seconds or minutes. 

The obvious downside is with transactional/OLTP data, which will require the creation of a strong versioning scheme, to track changes.

Finer details to remember:

  • BigQuery imposes a four-gigabyte per-file limit:– If your Cloud SQL is extracting data in JSON format, and storing it in a Google Cloud Storage bucket, prior to moving it to BigQuery, you will have to split your file(s) into multiple chunks.
  • BigQuery is “append-only“, i.e. it does not allow you to edit any existing record, all writes are treated as immutable. So you must have a clear strategy to avoid duplication and errors if your application demands so.
  • BigQuery does not guarantee data consistency for external data sources which are constantly updating. 
  • If the underlying data in CloudSQL changes while a federated query/transfer is running, it can result in data inconsistency.

Method 5: Using GCP to Create a DataPipeline

Step 1: Enable API

Go to the GCP console and enable the BigQuery Connection API.

Step 2: Set Cloud SQL to Public

Navigate to your Cloud SQL instance’s Connection tab and enable the public IP and save your settings.

Step 3: Authorize Address

Provide an authorized address for the connection to complete.

Step 4: Add External Data Source

In BigQuery, add an external data source and enter details about your Cloud SQL data.

Image Source

Step 5: Share Connection

Assign roles and email addresses for user access by clicking on ‘Share Connection’.

Step 6: Query with Federated Queries

Use the EXTERNAL_QUERY function in BigQuery to query your Cloud SQL data.

Remember to check for permissions, supported regions for federated queries, and compatible data types between Cloud SQL and BigQuery. If you encounter any issues, consult your administrator or refer to the documentation for troubleshooting.

Method 6: Using Cloud Dataflow to Create a Data Pipeline

  • Step 1: Create a routine/connector to publish changes from Cloud SQL to Pub/Sub.
  • Step 2: Write a Dataflow pipeline that consumes changes from Pub/Sub and migrates them to BigQuery.

This method is ideal when relational source data needs aggregation or denormalization before migration. Also, ensure you have a versioning schema to track changes if transporting OLTP/transactional data.

Remember, BigQuery treats all writes as immutable, so plan to avoid duplication and errors. Also, BigQuery has a 4 GB/file limit, so you may need to fragment large files before migration. If data in Cloud SQL changes during transfer, it could lead to inconsistencies.

Key Benefits of Migrating From CloudSQL to BigQuery

  • Dynamic Resource Allocation: BigQuery dynamically allocates resources based on usage, eliminating the need to provision and forecast compute and storage resources in advance
  • Fast Analytics: Offers fast analytics on a petabyte scale due to its unique capabilities and columnar data store, which ensures high data compression and minimized data scanning.
  • Scalability: BigQuery’s scalability and performance allow for quick and cost-effective querying on large datasets.
  • Integration: Seamlessly integrates with other Google Cloud services, enhancing data analysis and insight generation.

You can explore migrating data from Google Cloud SQL vs BigQuery.

Conclusion

In this blog post, you have learned 4 methods to move your data from Cloud SQL to BigQuery. For a hassle-free transfer, use Hevo.  Hevo Data, a No-Code Data Pipeline, helps you transfer data from a source of your choice in a fully-automated and secure manner without having to write the code repeatedly.

Getting data from many sources into destinations can be a time-consuming and resource-intensive task. Instead of spending months developing and maintaining such data integrations, you can enjoy a smooth ride with Hevo Data’s 150+ plug-and-play integrations (including 50+ free sources).

You can run complex SQL transformations from the comfort of Hevo’s interface and get your data in the final analysis-ready form. 

FAQ on CloudSQL to BigQuery

How do I migrate from Cloud SQL to BigQuery?

In order to migrate from Cloud SQL to BigQuery, you can export your data from Cloud SQL to a file supported by BigQuery, like CSV or JSON; you can create a dataset in BigQuery and use the web UI to load the exported data into the tables.

Is Cloud SQL the same as BigQuery?

Cloud SQL is focused on hosting traditional relational databases like MySQL, PostgreSQL, and SQL Server. In contrast, BigQuery is optimized for analyzing large datasets using SQL queries for data warehousing and analytics purposes.

How to connect Cloud SQL in GCP?

To connect Cloud SQL in GCP, you can select your Cloud SQL instance and note the connection details, then configure the firewall rules to allow your IP address to access Cloud SQL.

Is it posible to make a federated query from Cloud SQL to BigQuery?

Yes, it is possible to make federated queries from Cloud SQL to BigQuery using the BigQuery federated query feature.

Pratik Dwivedi
Technical Content Writer, Hevo Data

Pratik Dwivedi is a seasoned expert in data analytics, machine learning, AI, big data, and business intelligence. With over 18 years of experience in system analysis, design, and implementation, including 8 years in a Techno-Managerial role, he has successfully managed international clients and led teams on various projects. Pratik is passionate about creating engaging content that educates and inspires, leveraging his extensive technical and managerial expertise.

No-Code Data Pipeline for BigQuery