Are you looking to perform a detailed analysis of your data without having to disturb the production setup on SQL Server? In that case, moving data from SQL Server to a robust data warehouse like Google BigQuery is the right direction to take.
This article aims to guide you with steps to move data from Microsoft SQL Server to BigQuery, shed light on the common challenges, and assist you in navigating through them. You will explore two popular methods that you can utilize to set up Microsoft SQL Server to BigQuery migration.
Prerequisites
- An active Microsoft SQL Server account.
- An active Google Cloud Platform account.
- Working knowledge of Databases and Data Warehouses.
- Working knowledge of SQL queries.
- Clear idea regarding the type of data to be transferred.
Introduction to Microsoft SQL Server
Image Source: Logowik
Microsoft SQL Server, a relational database developed by Microsoft, provides strong core functions to store, manage and retrieve information. Microsoft SQL Server uses SQL(Structured Query Language) to manage the database and query the data in the database.
Microsoft SQL Server runs on a central server which allows for concurrent user access. The data collected is stored in the server which is accessed by the RDMS and queried using SQL. Microsoft SQL Server supports various business intelligence operations and transaction processing, and it also provides an end-to-end security system for sensitive business data.
To know more about Microsoft SQL Server, visit this link.
Introduction to Google BigQuery
Image Source
Google BigQuery is a serverless, cost-effective, highly scalable Data Warehousing platform that has built-in Machine Learning features. Its activities are carried out using the Business Intelligence Engine. It allows rapid SQL queries to be integrated with Google’s infrastructure’s processing capacity to manage business transactions, data across several databases, and access control policies for people viewing and querying data.
Many companies, including UPS, Twitter, and Dow Jones, utilize BigQuery. UPS utilizes BigQuery to forecast the exact volume of packages for its various offerings. Twitter uses BigQuery to assist update adverts and combine millions of data points per second.
To know more about Google BigQuery, visit this link.
Method 1: Manual ETL Process to Set Up Microsoft SQL Server to BigQuery Integration
This method involves the use of SQL Server Management Studio (SMSS) for setting up the integrations. Moreover, it requires you to convert the data into CSV format and then replicate the data. It requires a lot of engineering bandwidth and knowledge of SQL queries.
Method 2: Using Hevo Data to Set Up Microsoft SQL Server to BigQuery Integration
Hevo Data is an automated Data Pipeline platform that can move your data from Microsoft SQL Server to BigQuery very quickly without writing a single line of code. It is simple, hassle-free, and reliable.
Moreover, Hevo offers a fully-managed solution to set up data integration from Microsoft SQL Server and 150+ other data sources (including 50+ free data sources) and will let you directly load data to Data Warehouses such as Google BigQuery, or the destination of your choice.
It will automate your data flow in minutes without writing any line of code. Its Fault-Tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data.
Get Started with Hevo for Free
Methods to Set Up Microsoft SQL Server to BigQuery Integration
Majorly, there are two ways to migrate your data from Microsoft SQL to BigQuery. They are:
For the scope of this blog post, Method 1 will be covered extensively. Towards the end, you can also find the limitations of this method and ways to overcome them.
Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Learn the best practices and considerations for setting up high-performance ETL to BigQuery
Method 1: Manual ETL Process to Set Up Microsoft SQL Server to BigQuery Integration
The steps to execute the custom code are as follows:
Step 1: Export the Data from SQL Server using SQL Server Management Studio (SSMS)
SQL Server Management Studio(SSMS) is a free tool built by Microsoft to enable a coordinated environment for managing any SQL infrastructure. SSMS is used to query, design, and manage your databases from your local machine. We are going to be using the SSMS to extract our data in Comma Separated Value(CSV) format in the steps below.
- Install SSMS if you don’t have it on your local machine. You can install it here.
- Open SSMS and connect to a Structured Query Language (SQL) instance. From the object explorer window, select a database and right-click on the Tasks sub-menu, and choose the Export data option.
- The welcome page of the Server Import and Export Wizard will be opened. Click the Next icon to proceed to export the required data.
Image Source: SQL Server
- You will see a window to choose a data source. Select your preferred data source.
- In the Server name dropdown list, select a SQL Server instance.
- In the Authentication section select authentication for the data source connection. Next, from the Database drop-down box, select a database from which data will be copied. Once you have filled the drop-down list select ‘Next’.
- The next window is the choose the destination window. You will need to specify the location from which the data will be copied in the SQL server. Under the destination, the drop-down box selects the Flat File destination item.
- In the File name box, establish the CSV file where the data from the SQL database will be exported to and select the next button.
- The next window you will see is the Specify Table Copy or Query window, choose the Copy data from one or more tables or views to get all the data from the table.
Image Source: SQL Server
- Next, you’d see a Configure Flat File Destination window, select the table from the source table to export the data to the CSV file you specified earlier.
- At this point your file would have been exported, to view the exported file click on preview. To have a sneak peek of the data you just exported.
- Complete the exportation process by hitting ‘Next’. The save and run package window will pop up, click on ‘Next’.
- The Complete Wizard window will appear next, it will give you an overview of all the choices you made during the exporting process. To complete the exportation process, hit on ‘Finish’.
- The exported CSV file will be found in Local Drive, where you specified for it to be exported.
Step 2: Upload to Google Cloud Storage
After completing the exporting process to your local machine, the next step in SQL Server to BigQuery is to transfer the CSV file to Google Cloud Storage(GCS). There are various ways of achieving this, but for the purpose of this blog post, let’s discuss the following methods.
Method 1: Using Gsutil
gsutil is a GCP tool that uses Python programming language. It gives you access to GCS from the command line. To initiate gsutil follow this quickstart link. gsutil provides a unique way to upload a file to GCS from your local machine. To create a bucket in which you copy your file to:
gsutil mb gs://my-new-bucket
The new bucket created is called “my-new-bucket“. Your bucket name must be globally unique. If successful the command returns:
Creating gs://my-new-bucket/...
To copy your file to GCS:
gsutil cp export.csv gs://my-new-bucket/destination/export.csv
In this command, “export.csv” refers to the file you want to copy. “gs://my-new-bucket” represents the GCS bucket you created earlier. Finally, “destination/export.csv” specifies the destination path and filename in the GCS bucket where the file will be copied to.
Method 2: Using Web Console
The web console is another alternative you can use to upload your CSV file to the GCS from your local machine. The steps to use the web console are outlined below.
- First, you will have to log in to your GCP account. Toggle on the hamburger menu which displays a drop-down menu. Select Storage and click on the Browser on the left tab.
Image Source: Google Cloud Platform
- In order to store the file that you would upload from your local machine, create a new bucket. Make sure the name chosen for the browser is globally unique.
Image Source: Google Cloud Platform
- The bucket you just created will appear on the window, click on it and select upload files. This action will direct you to your local drive where you will need to choose the CSV file you want to upload to GCS.
- As soon as you start uploading, a progress bar is shown. The bar disappears once the process has been completed. You will be able to find your file in the bucket.
Step 3: Upload Data to BigQuery From GCS
BigQuery is where the data analysis you need will be carried out. Hence you need to upload your data from GCS to BigQuery. There are various methods that you can use to upload your files from GCS to BigQuery. Let’s discuss 2 methods here:
Method 1: Using the Web Console UI
- The first point of call when using the Web UI method is to select BigQuery under the hamburger menu on the GCP home page.
Image Source: Google Cloud Platform
- Select the “Create a new dataset” icon and fill in the corresponding drop-down menu.
- Create a new table under the data set you just created to store your CSV file.
- In the create table page –> in the source data section: Select GCS to browse your bucket and select the CSV file you uploaded to GCS – Make sure your File Format is set to CSV.
- Fill in the destination tab and the destination table.
- Under schema, click on the auto-detect schema.
- Select create a table.
- After creating the table, click on the destination table name you created to view your exported data file.
Image Source: Google Cloud Platform
- Using Command Line Interface, the Activate Cloud Shell icon shown below will take you to the command-line interface. You can also use the auto-detect feature to specify your schema.
Your schema can be specified using the Command-Line. An example is shown below
bq load --autodetect --source_format=CSV --schema=schema.json your_dataset.your_table gs://your_bucket/your_file.csv
In the above example, schema.json
refers to the file containing the schema definition for your CSV file. You can customize the schema by modifying the schema.json
file to match the structure of your data.
There are 3 ways to write to an existing table on BigQuery. You can make use of any of them to write to your table. Illustrations of the options are given below
1. Overwrite the data
To overwrite the data in an existing table, you can use the --replace
flag in the bq
command. Here’s an example code:
bq load --replace --source_format=CSV your_dataset.your_table gs://your_bucket/your_file.csv
In the above code, the --replace
flag ensures that the existing data in the table is replaced with the new data from the CSV file.
2. Append the table
To append data to an existing table, you can use the --noreplace
flag in the bq
command. Here’s an example code:
bq load --noreplace --source_format=CSV your_dataset.your_table gs://your_bucket/your_file.csv
The --noreplace
flag ensures that the new data from the CSV file is appended to the existing data in the table.
3. Add a new field to the target table. An extra field will be added to the schema.
To add a new field (column) to the target table, you can use the bq update
command and specify the schema changes. Here’s an example code:
bq update your_dataset.your_table --schema schema.json
In the above code, schema.json
refers to the file containing the updated schema definition with the new field. You need to modify the schema.json
file to include the new field and its corresponding data type.
Please note that these examples assume you have the necessary permissions and have set up the required authentication for interacting with BigQuery.
Step 4: Update the Target Table in BigQuery
GCS acts as a staging area for BigQuery, so when you are using Command-Line to upload to BigQuery, your data will be stored in an intermediate table. The data in the intermediate table will need to be updated for the effect to be shown in the target table.
There are two ways to update the target table in BigQuery.
- Update the rows in the final table and insert new rows from the intermediate table.
UPDATE final_table t SET t.value = s.value
FROM intermediate_data_table s
WHERE t.id = s.id;
INSERT INTO final_table (id, value)
SELECT id, value
FROM intermediate_data_table
WHERE id NOT IN (SELECT id FROM final_table);
In the above code, final_table
refers to the name of your target table, and intermediate_data_table
refers to the name of the intermediate table where your data is initially loaded.
2. Delete all the rows from the final table which are in the intermediate table.
DELETE FROM final_table
WHERE id IN (SELECT id FROM intermediate_data_table);
In the above code, final_table
refers to the name of your target table, and intermediate_data_table
refers to the name of the intermediate table where your data is initially loaded.
Please make sure to replace final_table
and intermediate_data_table
with the actual table names, you are working with.
This marks the completion of SQL Server to BigQuery connection. Now you can seamlessly sync your CSV files into GCP bucket in order to integrate SQL Server to BigQuery and supercharge your analytics to get insights from your SQL Server database.
Limitations of Manual ETL Process to Set Up Microsoft SQL Server to BigQuery Integration
Businesses need to put systems in place that will enable them to gain the insights they need from their data. These systems have to be seamless and rapid. Using custom ETL scripts to connect MS SQL Server to BigQuery has the following limitations that will affect the reliability and speed of these systems:
- Writing custom code is only ideal if you’re looking to move your data once from Microsoft SQL Server to BigQuery.
- Custom ETL code does not scale well with stream and real-time data. You will have to write additional code to update your data. This is far from ideal.
- When there’s a need to transform or encrypt your data, custom ETL code fails as it will require you to add additional processes to your pipeline.
- Maintaining and managing a running data pipeline such as this will need you to invest heavily in engineering resources.
While writing code to move data from SQL Server to BigQuery looks like a no-brainer, in the beginning, the implementation and management are much more nuanced than that. The process has a high propensity for errors which will, in turn, have a huge impact on the data quality and consistency.
Method 2: Using Hevo Data to Set Up Microsoft SQL Server to BigQuery Integration
Image Source
Hevo Data, a No-code Data Pipeline, helps you directly transfer data from Microsoft SQL Server and 150+ sources to Google BigQuery and other Data Warehouses, Databases, BI tools, or a destination of your choice in a completely hassle-free & automated manner.
Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
Hevo Data takes care of all your Data Preprocessing needs and lets you focus on key business activities and draw a much more powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination.
Sign up here for a 14-Day Free Trial!
The steps to load data from Microsoft SQL Server to BigQuery using Hevo Data are as follows:
- Connect your Microsoft SQL Server account to Hevo’s platform. Hevo has an in-built Microsoft SQL Server Integration that connects to your account within minutes.
Image Source: Hevo Docs
- Select Google BigQuery as your destination and start moving your data.
Image Source: Hevo Docs
With this, you have successfully set up Microsoft SQL Server to BigQuery Integration using Hevo Data.
Here are more reasons to try Hevo:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Conclusion
This article gave you a comprehensive guide to setting up Microsoft SQL Server to BigQuery integration using 2 popular methods. It also gave you a brief overview of Microsoft SQL Server and Google BigQuery. There are certain limitations also associated with the custom ETL method.
With Hevo, you can achieve simple and efficient Data Replication from Microsoft SQL Server to BigQuery. Hevo can help you move data from not just SQL Server but 100s of additional data sources.
Visit our Website to Explore Hevo
Businesses can use automated platforms like Hevo Data to set this integration and handle the ETL process. It helps you directly transfer data from a source of your choice to a Data Warehouse, Business Intelligence tools, or any other desired destination in a fully automated and secure manner without having to write any code and will provide you with a hassle-free experience of connecting your SQL Server to BigQuery instance.
Want to try Hevo? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. Have a look at our unbeatable pricing, which will help you choose the right plan for you.
Share your experience of loading data from Microsoft SQL Server to BigQuery in the comment section below.