Are you looking to connect Cloud SQL to BigQuery? We have you covered. Moving data to a data warehouse like BigQuery helps you store large amounts of data for better analysis. BigQuery is a powerful cloud-based data warehouse and analytics platform that provides a cost-effective solution to store, process, and analyze large amounts of data.
With the help of BigQuery, companies can reduce costs and gain insights faster.
In this article, we will discuss the advantages of Cloud SQL to BigQuery replication, the process of migration, and useful tips to help you make the move.
What is Cloud SQL?
Image Source
Cloud SQL is a fully-managed database service using which you can create, maintain and manage your databases on the Google Cloud Platform(GCP). It can be used with MySQL, PostgreSQL, or SQL Server; hosted on the GCP.
You can use Cloud SQL with App Engine applications that are written in many languages like Java, Python, PHP, Node.js, Go, Ruby, etc. It’s popular because of its support for multiple database engines and programming languages.
How to Load Data from Cloud SQL to BigQuery?
There can be primarily 4 strategies to move data from Cloud SQL to BigQuery via CloudSQL :
1. Write Code to Directly Transfer from Source Database to BigQuery
Let’s take MySQL for Cloud SQL, as an example.
Your MySQL instance will be adding data continuously, hence you will need to schedule your Bigquery transfers regularly, at some pre-decided interval.
Also, you will have to use a marker or a flag, or a bookmark, to always point at the last record inserted in the previous iteration of the load job.
Your current iteration will now begin from the record next to the marker. This marker can be a timestamp (stored for every record, tells you the records till which the insertion has already been done) OR a unique key.
With every new iteration/load job, you will start from the marker and transfer all records till the latest added record, to BigQuery.
Another option here can be to first export your source data into CSV format and allow BigQuery to import your CSVs.
2. Using Federated Queries to Move Data from Cloud SQL to BigQuery
BigQuery supports Cloud SQL federated queries which let you query your Cloud SQL database directly from BigQuery.
BigQuery will run this federated SQL on its own, and pick up records from your Cloud SQL database.
Using the new SQL function EXTERNAL_QUERY(), BigQuery can query data residing in Cloud SQL in real-time, without copying or moving data.
Its syntax is as follows:
SELECT * FROM EXTERNAL_QUERY(Your_Source_Database_Connection_ID, external_database_query);
E.g. If you have an “employee_salary” table in BigQuery and an “employee”/”salary” table in Cloud MySQL.
INSERT
YourBigQueryDatabaseName.employee_salary
SELECT
*
FROM
EXTERNAL_QUERY(
"CloudSQL_Database_connection_ID",
"SELECT employee_Name, last_salary_drawn FROM employee e, salary s WHERE e.employee_ID = s.employee_ID ");
EXTERNAL_QUERY executes the query in Cloud SQL and returns results as a temporary table where the source data types are automatically converted to BigQuery data types.
The downside to this Cloud SQL to BigQuery method is that you have to code and update your queries often, as your data on both sides grows.
3. Cloud SQL to BigQuery through Cloud Pub/Sub
You can split the writing process to pick data from CloudSQL and store it in Cloud Pub/Sub, temporarily, and then have a Dataflow reader to stream into BigQuery.
First, create a connector/routine that publishes changes from Cloud SQL to Pub/Sub. Then write a Dataflow pipeline that consumes those changes from PubSub, and pushes them to BigQuery.
If you have relational data at the source which must be denormalized/aggregated before the transfer, this approach works well.
Also, if you have legacy relational tables whose data types are much different from BigQuery and would need a lot of pre-processing, you can consider this option.
The upside is that you can reduce overall latency to just a few seconds or minutes.
The obvious downside is with transactional/OLTP data, which will require the creation of a strong versioning scheme, to track changes.
Finer details to remember:
- BigQuery imposes a four-gigabyte per-file limit:- If your Cloud SQL is extracting data in JSON format, and storing it in a Google Cloud Storage bucket, prior to moving it to BigQuery, you will have to split your file(s) into multiple chunks.
- BigQuery is “append-only”, i.e. it does not allow you to edit any existing record, all writes are treated as immutable. So you must have a clear strategy to avoid duplication and errors if your application demands so.
- BigQuery does not guarantee data consistency for external data sources which are constantly updating.
- If the underlying data in CloudSQL changes while a federated query/transfer is running, it can result in data inconsistency.
4. The Simpler and Efficient Option- Hevo
Hevo manages all these complexities for you in an efficient and secure manner. You can transfer your data to BigQuery without writing any code or taking care of nuances that can be time-consuming and annoying.
Configure Cloud SQL as a Source
Image Source
Configure BigQuery as a Destination
Image Source
Advantages of using the Hevo Data Platform to move data from Cloud SQL to BigQuery:
- Minimal Setup: You will require minimal setup and bandwidth to set up a Cloud SQL BigQuery connection using the Hevo platform.
- No Data Loss: Hevo architecture is fault-tolerant and allows easy, reliable, and seamless transfer of data to BigQuery without data loss.
- 150+ of Out of the Box Integrations: Hevo platform brings data from other sources such as SDKs, Cloud Applications, Databases, and so on into Data Warehouses and Databases. So, Hevo is the right partner for all your growing data needs.
- Automatic Schema Detection and mapping: The schema of incoming data is scanned automatically. If there are changes detected, they are handled seamlessly and the changes are incorporated into the Database or Data Warehouse.
- Exceptional Support: Hevo has 24×7 Technical support through emails, calls, and chat.
Conclusion
In this blog post, you have learned 4 methods to move your data from Cloud SQL to BigQuery. For a hassle-free transfer, use Hevo. Hevo Data, a No-Code Data Pipeline, helps you transfer data from a source of your choice in a fully-automated and secure manner without having to write the code repeatedly.
Getting data from many sources into destinations can be a time-consuming and resource-intensive task. Instead of spending months developing and maintaining such data integrations, you can enjoy a smooth ride with Hevo Data’s 150+ plug-and-play integrations (including 50+ free sources).
Visit our Website to Explore Hevo Data
Saving countless hours of manual data cleaning & standardizing, Hevo Data’s pre-load data transformations get it done in minutes via a simple drag-and-drop interface or your custom Python scripts. No need to go to your data warehouse for post-load transformations. You can run complex SQL transformations from the comfort of Hevo’s interface and get your data in the final analysis-ready form.
Want to take Hevo Data for a ride? Sign Up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.