As businesses continue to generate massive amounts of data, the need for efficient and scalable data storage and analysis solutions becomes increasingly important. Two popular options for data warehousing are Google BigQuery and Azure Synapse Analytics, both of which offer powerful features for processing large datasets. By connecting BigQuery to Azure Synapse, you can leverage the best of both worlds.
Building an all-new data connector is challenging, especially when you are already overloaded with managing & maintaining your existing custom data pipelines. To fulfill your business team’s ad-hoc BigQuery to Azure Synapse connection request, you’ll have to invest a significant portion of your engineering bandwidth.
We know you are short on time & need a quick way out. No worries, this article provides a step-by-step process on how to use the Copy Data Tool in Azure Synapse to connect and transfer data from Google BigQuery to Azure Synapse.
How to Connect BigQuery to Azure Synapse?
You can manually connect BigQuery to Azure Synapse using the Copy Data tool in Azure Synapse Analytics.
Creating a Linked Service from BigQuery to Azure
- Step 1: Log in to Azure Synapse Analytics and navigate to your Synapse workspace, where you need to replicate your data. Click on the Manage button present on the left side vertical menu. Click on the Linked Service and then the “+ New” button.
- Step 2: Search for BigQuery and select the connector as your source.
- Step 3: Enter the Connection name and provide the BigQuery project ID, Client ID, Client Secret, and the Refresh Token. For the Authentication Type, this article considers “User Authentication,” which uses OAuth 2 authorization to connect your Google user account to Azure Synapse Analytics.
Generating Client ID, Secret, and Refresh Token
- Step 1: Open Google Console and select the existing BigQuery project from which you need the data. Click on the Credentials tab.
- Step 2: Navigate to CREATE CREDENTIALS > OAuth ClientID and set the application type as Desktop. Enter the name of the app and click on the CREATE button.
- Step 3: A window will appear on the screen with your ClientID and Client Secret. Copy them and save them for later use.
- Step 4: Add the client id in the following URL and then open it in your browser.
- Step 5: To get the authorization code, Google will now ask you to select the Google account you used for generating credentials. Choose the account and click on the Allow button to grant access. Now, copy the authorization code displayed before you.
- Step 6: In another tab, go to your Postman account and send a POST request as shown below for the URL: https://www.googleapis.com/oauth2/v4/token. For the “code,” enter the authorization code in the above step. After the successful response status of 200, you will get the required Refresh Token.
Finally, add the Client ID, Client Secret and the Refresh Token Azure Synapse Analytics and click the Test Connection button. After a successful test, click on the Create button to save the BigQuery to the Azure connection.
Building BigQuery to Azure Synapse Pipeline using Copy Data Tool
To get started with the Copy data tool, follow the simple steps:
- Step 1: In your Azure Synapse workspace, click the Integrate option on the left vertical menu and go to + > Copy Data Tool.
- Step 2: Enter the Task name for your BigQuery to Azure pipeline. You can also schedule the pipeline to run at a specific time by clicking the Schedule option. Once done, click on the Next button.
- Step 3: Select the BigQuery to Azure connection you created using Linked Service and click Next.
- Step 4: Choose the tables you want to replicate or write a query to get the required data and click the Next button.
Apply any filters according to your business need, click the Validate button and then the Next button.
- Step 5: Select the destination where you want to replicate the data inside your Synapse workspace and click on the Next button. Here, an SQL Pool is selected as an example.
- Step 6: You can now map your fields of the source table to the destination table and set the data types as well. Once done, click on the Next button.
- Step 7: If you need to stage your Big data, then you can select the staging location and its path. You can also choose “Polybase” when dealing with large data sets and want multiple queries to be processed parallelly. Once done, click on the Next button.
- Step 8: You will now get the summary of your pipeline and can edit the source and destination names. Click on the Next button, and the BigQuery to Azure pipeline will reach the completion phase. Once satisfied with the pipeline, click the Finish button to complete the BigQuery to Azure integration.
This manual approach of connecting BigQuery to Azure can be fruitful for the following scenarios:
- Single-Use Data Transfer: Compared to BigQuery, Azure’s consumption-based pricing might seem attractive. Though, a long-term data pipeline integration service can be expensive. Hence, the Copy Data Tool is excellent for short-term & one-time BigQuery to Azure data replication.
- No Data Transformation Required: As the Copy Data Tool offers little to no data transformation features, it isn’t advised to use it on clean, analysis-ready data.
- Integrations Within Azure: Standalone Copy Data Tool will not be enough if you need to integrate with data sources outside the Azure Ecosystem. Thus, due to its limited integrations, it is best applicable only for connecting to Azure-based data sources.
However, setting up manual connections and writing custom code to replicate data from Google BigQuery to Azure Synapse is cumbersome. Frequent breakages, pipeline errors, and lack of data flow monitoring make scaling such a system a nightmare.
An automated tool like Hevo Data is an efficient and economical choice that takes away months of manual work. It has the following benefits:
- Allows you to focus on core engineering objectives while your business teams can jump on to reporting without any delays or data dependency on you.
- Your business teams can effortlessly enrich, filter, aggregate, and segment raw Google BigQuery data with just a few clicks.
- The beginner-friendly UI saves the engineering teams’ bandwidth from tedious data preparation tasks.
What can you achieve by replicating data from BigQuery to Azure Synapse?
Transferring data from Google BigQuery to Azure Synapse allows you to leverage the following features:
- Azure Synapse is optimized for Big Data, so replicating data from BigQuery to Synapse can enable faster data processing and analysis.
- Azure Synapse is a part of the larger Azure ecosystem, so replicating data from BigQuery to Synapse can enable integration with other Azure services, such as Azure Machine Learning and Azure Data Factory.
- Azure Synapse offers a suite of analytics tools, such as SQL Server Analysis Services and Power BI, which can be used to gain insights from replicated BigQuery data.
Connecting BigQuery to Azure Synapse using the Copy Data Tool is an effective way to quickly set the integration using a beginner-friendly user interface. However, for raw data that needs multiple data transformations to convert it into an analysis-ready form, this method might not be the most optimal choice! Especially for cases when data needs to be replicated from multiple Non-Azure/Microsoft data sources.
A straightforward solution would be to ask your engineering team to make custom connectors for you and maintain these pipelines. Though, this is a time-consuming and resource-intensive task. They would need to monitor and fix any data leaks on priority constantly. To remedy this, you can simply use no-code cloud-based ETL tools like Hevo Data, which offers 150+ plug-and-play integrations. Note that Azure is an upcoming destination for Hevo. Soon you will be able to effortlessly connect BigQuery to Azure in a matter of minutes.
Visit our Website to Explore Hevo
Saving countless hours of manual data cleaning & standardizing, Hevo Data’s pre-load data transformations get it done in minutes via a simple drag n-drop interface or your custom python scripts. No need to go to your data warehouse for post-load transformations. You can simply run complex SQL transformations from the comfort of Hevo’s interface and get your data in the final analysis-ready form. We are happy to announce that we have launched Azure Synapse as a destination.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.
Share your experience learning how to connect BigQuery to Azure Synapse effectively! Let us know in the comments section below!
All your customer data in one place.