Unlock the full potential of your Cloud SQL data by integrating it seamlessly with BigQuery. With Hevo’s automated pipeline, get data flowing effortlessly—watch our 1-minute demo below to see it in action!
Are you looking to connect Cloud SQL to BigQuery? We have you covered. Moving data to a data warehouse like BigQuery helps you store large amounts of data for better analysis. BigQuery is a powerful cloud-based data warehouse and analytics platform that provides a cost-effective solution to store, process, and analyze large amounts of data.
With the help of BigQuery, companies can reduce costs and gain insights faster. In this article, we will discuss the advantages of Cloud SQL to BigQuery replication, the process of migration, and useful tips to help you make the move.
What is Cloud SQL?
Cloud SQL is a fully-managed database service using which you can create, maintain and manage your databases on the Google Cloud Platform(GCP). It can be used with MySQL, PostgreSQL, or SQL Server; hosted on the GCP.
You can use Cloud SQL with App Engine applications that are written in many languages like Java, Python, PHP, Node.js, Go, Ruby, etc. It’s popular because of its support for multiple database engines and programming languages.
What is BigQuery?
BigQuery is Google’s cloud enterprise data warehouse that primarily serves business agility in running complex SQL queries and performing analysis on huge datasets efficiently. It is based on Google technology called Dremel, using columnar storage and tree architecture to support high-speed scanning of data for querying efficiency.
BigQuery is serverless, highly scalable, and enlists Google’s infrastructure in the cloud for management. It uses access patterns to maintain efficiency in storage by dynamically optimizing data structures for the handling of data in a changing environment for optimal performance.
Use Cases of Migrating Data from Cloud SQL to BigQuery
I have highlighted some key use cases of migrating data from Cloud SQL to BigQuery.
- Advanced Analytics: BigQuery offers advanced analytics; you can use it for big data analysis and running complex queries.
- Scalability: Scale with growing data volume using BigQuery’s serverless architecture.
- Cost Effectiveness: Optimize costs through BigQuery’s pay-as-you-go model for large datasets.
- Real-Time Data Processing: Enable real-time data processing and querying for insights and reporting in real-time.
- Advanced Machine Learning: Combine this with BigQuery ML to construct and deploy machine learning models directly in your data warehouse.
- Unified Data Environment: Integrate data from Cloud SQL and other sources into BigQuery. It can act as one single go-to platform for all your analysis and reporting.
How to Load Data from Cloud SQL to BigQuery?
In this section, we will discuss the different methods to import data from cloud SQL to Bigquery. There can be primarily 4 strategies to move data from Cloud SQL to BigQuery via CloudSQL :
Method 1: The Simpler and Efficient Option- Hevo
Prerequisites: Ensure your Google Cloud MySQL instance is active, MySQL version is 5.5 or higher, and BinLog replication is enabled. Have your database credentials ready.
- User Setup: Confirm the database user has SELECT and REPLICATION privileges. If needed, create a new user with these privileges.
- Hevo Setup: Log in to your Hevo account and whitelist Hevo’s IP on your Google Cloud MySQL instance.
- Configure Pipeline: In Hevo, create a new Pipeline with a unique name, select ingestion mode, and select CloudMySQL as the source. Enter the specified details and complete the source setup.
- Test & Proceed: Test the connection in Hevo, then proceed to set up the Destination.
Step 1: Configure Cloud SQL as a Source
Step 2: Configure BigQuery as a Destination
Here are the steps to setup BigQuery as the Destination.
- In the Hevo dashboard, access the destinations option by clicking on the DESTINATIONS button located in the Navigation Bar.
- Next, create a new destination by selecting the + CREATE button in the Destinations List View.
- In the Add Destination page, choose Google BigQuery as the Destination type.
- Once you’ve selected Google BigQuery, you will be redirected to the Configure your Google BigQuery Warehouse page, where you can set up your warehouse according to your preferences.
- Learn how to set up CloudSQL as a Source here.
- Learn how to setup BigQuery as a Destination here.
Key features of Hevo Data Platform to move data from Cloud SQL to BigQuery:
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Schema Management: Hevo can automatically detect the schema of the incoming data and maps it to the destination schema.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
Load your Data into BigQuery seamlessly
No credit card required
Method 2: Write Code to Directly Transfer from Source Database to BigQuery
Let’s take MySQL for Cloud SQL, as an example.
Your MySQL instance will be adding data continuously, hence you will need to schedule your Bigquery transfers regularly, at some pre-decided interval. Also, you will have to use a marker or a flag, or a bookmark, to always point at the last record inserted in the previous iteration of the load job.
Your current iteration will now begin from the record next to the marker. This marker can be a timestamp (stored for every record, tells you the records till which the insertion has already been done) OR a unique key.
With every new iteration/load job, you will start from the marker and transfer all records till the latest added record, to BigQuery.
Another option here can be to first export your source data into CSV format and allow BigQuery to import your CSVs.
Method 3: Using Federated Queries to Move Data from Cloud SQL to BigQuery
BigQuery supports Cloud SQL federated queries which let you query your Cloud SQL database directly from BigQuery. BigQuery will run this federated SQL on its own, and pick up records from your Cloud SQL database.
Using the new SQL function EXTERNAL_QUERY(), BigQuery can query data residing in Cloud SQL in real-time, without copying or moving data.
Its syntax is as follows:
SELECT * FROM EXTERNAL_QUERY(Your_Source_Database_Connection_ID, external_database_query);
E.g. If you have an “employee_salary” table in BigQuery and an “employee”/”salary” table in Cloud MySQL.
INSERT
YourBigQueryDatabaseName.employee_salary
SELECT
*
FROM
EXTERNAL_QUERY(
"CloudSQL_Database_connection_ID",
"SELECT employee_Name, last_salary_drawn FROM employee e, salary s WHERE e.employee_ID = s.employee_ID ");
EXTERNAL_QUERY executes the query in Cloud SQL and returns results as a temporary table where the source data types are automatically converted to BigQuery data types.
To view all tables in a database,
SELECT
*
FROM
EXTERNAL_QUERY
(
"connection_id"
,
"select * from information_schema.tables;"
);
To View the table’s columns,
SELECT
*
FROM
EXTERNAL_QUERY
(
"connection_id"
,
"select * from information_schema.columns where table_name='x';"
);
The downside to this Cloud SQL to BigQuery method is that you have to code and update your queries often, as your data on both sides grows.
Integrate SQL Server on Google Cloud SQL to BigQuery
Integrate SQL Server on Google Cloud SQL to BigQuery
Integrate PostgreSQL on Google Cloud SQL to BigQuery
Method 4: Cloud SQL to BigQuery through Cloud Pub/Sub
You can split the writing process to pick data from CloudSQL and store it in Cloud Pub/Sub, temporarily, and then have a Dataflow reader to stream into BigQuery.
First, create a connector/routine that publishes changes from Cloud SQL to Pub/Sub. Then write a Dataflow pipeline that consumes those changes from PubSub, and pushes them to BigQuery.
If you have relational data at the source which must be denormalized/aggregated before the transfer, this approach works well.
Also, if you have legacy relational tables whose data types are much different from BigQuery and would need a lot of pre-processing, you can consider this option.
The upside is that you can reduce overall latency to just a few seconds or minutes.
The obvious downside is with transactional/OLTP data, which will require the creation of a strong versioning scheme, to track changes.
Finer details to remember:
- BigQuery imposes a four-gigabyte per-file limit:– If your Cloud SQL is extracting data in JSON format, and storing it in a Google Cloud Storage bucket, prior to moving it to BigQuery, you will have to split your file(s) into multiple chunks.
- BigQuery is “append-only“, i.e. it does not allow you to edit any existing record, all writes are treated as immutable. So you must have a clear strategy to avoid duplication and errors if your application demands so.
- BigQuery does not guarantee data consistency for external data sources which are constantly updating.
- If the underlying data in CloudSQL changes while a federated query/transfer is running, it can result in data inconsistency.
Method 5: Using GCP to Create a DataPipeline
Step 1: Enable API
Go to the GCP console and enable the BigQuery Connection API.
Step 2: Set Cloud SQL to Public
Navigate to your Cloud SQL instance’s Connection tab and enable the public IP and save your settings.
Step 3: Authorize Address
Provide an authorized address for the connection to complete.
Step 4: Add External Data Source
In BigQuery, add an external data source and enter details about your Cloud SQL data.
Image Source
Step 5: Share Connection
Assign roles and email addresses for user access by clicking on ‘Share Connection’.
Step 6: Query with Federated Queries
Use the EXTERNAL_QUERY function in BigQuery to query your Cloud SQL data.
Remember to check for permissions, supported regions for federated queries, and compatible data types between Cloud SQL and BigQuery. If you encounter any issues, consult your administrator or refer to the documentation for troubleshooting.
Method 6: Using Cloud Dataflow to Create a Data Pipeline
- Step 1: Create a routine/connector to publish changes from Cloud SQL to Pub/Sub.
- Step 2: Write a Dataflow pipeline that consumes changes from Pub/Sub and migrates them to BigQuery.
This method is ideal when relational source data needs aggregation or denormalization before migration. Also, ensure you have a versioning schema to track changes if transporting OLTP/transactional data.
Remember, BigQuery treats all writes as immutable, so plan to avoid duplication and errors. Also, BigQuery has a 4 GB/file limit, so you may need to fragment large files before migration. If data in Cloud SQL changes during transfer, it could lead to inconsistencies.
Key Benefits of Migrating From CloudSQL to BigQuery
- Dynamic Resource Allocation: BigQuery dynamically allocates resources based on usage, eliminating the need to provision and forecast compute and storage resources in advance
- Fast Analytics: Offers fast analytics on a petabyte scale due to its unique capabilities and columnar data store, which ensures high data compression and minimized data scanning.
- Scalability: BigQuery’s scalability and performance allow for quick and cost-effective querying on large datasets.
- Integration: Seamlessly integrates with other Google Cloud services, enhancing data analysis and insight generation.
You can explore migrating data from Google Cloud SQL vs BigQuery.
Conclusion
In this blog post, you have learned 4 methods to move your data from Cloud SQL to BigQuery. For a hassle-free transfer, use Hevo. Hevo Data, a No-Code Data Pipeline, helps you transfer data from a source of your choice in a fully-automated and secure manner without having to write the code repeatedly.
Getting data from many sources into destinations can be a time-consuming and resource-intensive task. Instead of spending months developing and maintaining such data integrations, you can enjoy a smooth ride with Hevo Data’s 150+ plug-and-play integrations (including 50+ free sources).
You can run complex SQL transformations from the comfort of Hevo’s interface and get your data in the final analysis-ready form.
FAQ on CloudSQL to BigQuery
How do I migrate from Cloud SQL to BigQuery?
In order to migrate from Cloud SQL to BigQuery, you can export your data from Cloud SQL to a file supported by BigQuery, like CSV or JSON; you can create a dataset in BigQuery and use the web UI to load the exported data into the tables.
Is Cloud SQL the same as BigQuery?
Cloud SQL is focused on hosting traditional relational databases like MySQL, PostgreSQL, and SQL Server. In contrast, BigQuery is optimized for analyzing large datasets using SQL queries for data warehousing and analytics purposes.
How to connect Cloud SQL in GCP?
To connect Cloud SQL in GCP, you can select your Cloud SQL instance and note the connection details, then configure the firewall rules to allow your IP address to access Cloud SQL.
Is it posible to make a federated query from Cloud SQL to BigQuery?
Yes, it is possible to make federated queries from Cloud SQL to BigQuery using the BigQuery federated query feature.
Pratik Dwivedi is a seasoned expert in data analytics, machine learning, AI, big data, and business intelligence. With over 18 years of experience in system analysis, design, and implementation, including 8 years in a Techno-Managerial role, he has successfully managed international clients and led teams on various projects. Pratik is passionate about creating engaging content that educates and inspires, leveraging his extensive technical and managerial expertise.