Press "Enter" to skip to content

Microsoft SQL Server to BigQuery – Steps to Move Data

SQL Server to BigQuery - Steps to Move Data
Are you looking to perform detailed analysis of your data without having to disturb the production setup on SQL Server? In that case, moving data from SQL Server to a robust data warehouse like Google BigQuery is the right direction to take. This article aims to guide you with steps to move data from Microsoft SQL Server to BigQuery, shed light on the common challenges, and assist you in navigating through them.

Understanding SQL Server and BigQuery

Microsoft SQL Server, a relational database developed by Microsoft, provides strong core functions to store, manage and retrieve information. Microsoft SQL Server uses SQL(Structured Query Language) to manage the database and query the data in the database. Microsoft SQL Server runs on a central server which allows for concurrent user access. The data collected is stored in the server which is accessed by the RDMS and queried using SQL. Microsoft SQL Server supports various business intelligence operations, transaction processing, and it also provides an end to end security system for sensitive business data.
Google BigQuery is a data warehouse solution form Google’s suite of cloud computing services. BigQuery is based on a serverless model that gives you a high level of abstraction. This frees you from maintaining any form of physical infrastructure and database administrators. BigQuery has a very flexible parallel compute engine that allows you to scale to thousands of cores in a few seconds. BigQuery stores data in a columnar structure. When a query is run BigQuery reads only the columns involved. This, in turn, ensures a very efficient CPU usage. BigQuery is also very cost-effective as you only pay for the queries you run. BigQuery is encrypted by default and hence security aspect is well taken care of.

Methods to move data from Microsoft SQL Server to Google BigQuery

There are two popular methods to perform SQL Server to BigQuery data replication.

Method 1: An easy to use Data Pipeline Platform like Hevo Data

 

Method 2: Write custom ETL scripts to move data

Migrating Data from Microsoft SQL Server to Google BigQuery

Majorly, there are two ways to migrate your data from Microsoft SQL to Google BigQuery. They are:
  1. Using a Third Party ETL (Extract Transform Load) tool like Hevo
  2. Write Custom ETL code to move data from Microsoft SQL Server to BigQuery.
For the scope of this blog post, Method 2 will be covered extensively. Towards the end, you can also find the limitations of this method and ways to overcome them.

Migrating data from Microsoft SQL Server to Google BigQuery Using Custom ETL code:

The steps to execute the custom code are as follows
  1. Export the data from SQL Server using SQL Server Management Studio (SSMS)
  2. Upload to Google Cloud Storage
  3. Upload to BigQuery from Google Cloud Storage (GCS)
  4. Update the target table in BigQuery

Step 1: Export the data from SQL Server using SQL Server Management Studio (SSMS)

SQL Server Management Studio(SSMS) is a free tool built by Microsoft to enable a coordinated environment for managing any SQL infrastructure. SSMS is used to query, design and manage your databases from your local machine. We are going to be using the SSMS to extract our data in Comma Separated Value(CSV) format in the steps below.
  1. Install SSMS if you don’t have it on your local machine. You can install it here.
  2. Open SSMS and connect to a Structured Query Language (SQL) instance. From the object explorer window, select a database and right-click on the Tasks sub-menu and choose the Export data option.
  3. The welcome page of the Server Import and Export Wizard will be opened. Click the Next icon to proceed to export the required data.
    Configuring SSMS for Data Replication
  4. You will see a window to choose a data source. Select your preferred data source.
  5. In the Server name dropdown list, select a SQL Server instance.
  6. In the Authentication section select authentication for the data source connection. Next, from the Database drop-down box, select a database from which data will be copied. Once you have filled the drop-down list select ‘Next’.
  7. The next window is the choose the destination window. You will need to specify the location from which the data will be copied from in the SQL server. Under the destination, drop-down box select the Flat File destination item.
  8. In the File name box, establish the CSV file where the data from the SQL database will be exported to and select the next button.
  9. The next window you will see is the Specify Table Copy or Query window, choose the Copy data from one or more tables or views to get all the data from the table.

    Configuring SSMS for SQL Server to BigQuery Data Replication

  10. Next, you’d see a Configure Flat File Destination window, select the table from the source table to export the data to the CSV file you specified earlier.
  11. At this point your file would have been exported, to view the exported file click on preview. To have a sneak peek of the data you just exported.
  12. Complete the exportation process by hitting ‘Next’. The save and run package window will pop up, click on ‘Next’.
  13. The Complete Wizard window will appear next, it will give you an overview of all the choices you made during the exporting process. To complete the exportation process, hit on ‘Finish’.
  14. The exported CSV file will be found in Local Drive, where you specified for it to be exported to.

Step 2: Upload to Google Cloud Storage

After completing the exporting process to your local machine, the next step is to transfer the CSV file to Google Cloud Storage(GCS). There are various ways of achieving this, but for the purpose of this blog post, let’s discuss the following methods.

Method 1: Using Gsutil

gsutil is a GCP tool that uses Python programming language. It gives you access to GCS from the command line. To initiate Gsutil follow this quickstart link. gsutil provides a unique way to upload a file to GCS from your local machine.
 
To create a bucket in which you copy your file to:
gsutil mb -l us-east1 gs://my-new-bucket/
The new bucket created is called my-new-bucket. Your bucket name must be globally unique.
 
If successful the command returns:
Creating gs://my-new-bucket/...
To copy your file to GCS:
gsutil cp export.csv gs://my-new-bucket/path/to/folder/ 
Where export.csv is the CSV file you want to copy the new bucket you just created.

Method 2: Using Web console

The web console is another alternative you can use to upload your CSV file unto the GCS from your local machine. The steps to use the web console are outlined below.

  1. First, you will have to log in to your GCP account. Toggle on the hamburger menu which displays a drop-down menu. Select Storage and click on the browser on the left tab.
    Configuring GCP
  2. In order to store the file that you would upload from your local machine, create a new bucket. Make sure the name chosen for the browser is globally unique.

    Creating GCP Bucket
  3. The bucket you just created will appear on the window, click on it and select upload files. This action will direct you to your local drive where you will need to choose the CSV file you want to upload to GCS.
  4. As soon as you start uploading, a progress bar is shown. The bar disappears once the process has been completed. You will be able to find your file in the bucket.

Step 3: Upload data to BigQuery From GCS

BigQuery is where the data analysis you need will be carried out. Hence you need to upload your data from GCS to BigQuery. There are various methods that you can use to upload your files from GCS to BigQuery. Let’s discuss 2 methods here:

Method 1: Using the Web Console UI

  1. The first point of call when using the Web UI method is to select BigQuery under the hamburger menu in the GCP home page.
    Using GCP Web UI
  2. Select the create a new dataset icon and fill the corresponding drop-down menu.
  3. Create a new table under the data set you just created to store your CSV file
  4. In the create table page –> in the source data section: Select GCS to browse your bucket and select the CSV file you uploaded to GCS
    – Make sure your File Format is set to CSV
    – Fill the destination tab and the destination table
    – Under schema, click on auto detect schema
    – Select create table
  5. After creating the table, click on the destination table name you created to view your exported data file.
    SQL Server to BigQuery - CLI
    Using Command Line Interface
    The Activate Cloud Shell icon shown below will take you to the command-line interface command-line syntax is shown below
    Activating Shell
bq --location=[LOCATION] load --source_format=[FORMAT] [DATASET].[TABLE]
[PATH_TO_SOURCE] [SCHEMA]

[LOCATION] is an optional parameter and represents your Location.
[FORMAT] is to be set to CSV.
[DATASET] represents an existing dataset.
[TABLE] is the table name into which you're loading data.
[PATH_TO_SOURCE] is a fully-qualified Cloud Storage URI.
[SCHEMA] is a valid schema. The schema must be a local JSON file or inline. 
Note: You can use the autodetect flag to specify your schema.
Your schema can be specified using the Command-Line. an example is shown below
bq --location=US load --source_format=CSV your_dataset.your_table gs://my-new-bucket/your_data.csv ./your_schema.json
You can also use the auto-detect feature to specify your schema. The code block below explains it.
 bq –location=US load –autodetect –source_format=CSV your_dataset.your_table gs://my-new-bucket/data.csv 
 
There are 3 ways to write to an existing table on BigQuery. You can make use any of them to write to your table. Illustrations to the options are given below
1. Overwrite the data
bq --location=US load --autodetect --replace --source_format=CSV your_dataset.your_table gs://bucket_name/path/to/file/your_file_name.csv
2. Append the table
bq --location=US load --autodetect --noreplace --source_format=CSV your_dataset.your_table gs://bucket_name/path/to/file/your_file_name.csv ./schema_file.json
3. Add a new field to the target table. An extra field will be added to the schema.
bq --location=asia-northeast1 load --noreplace --schema_update_option=ALLOW_FIELD_ADDITION --source_format=CSV your_dataset.your_table gs://mybucket/your_data.csv ./your_schema.json

Step 4: Update the Target Table in BigQuery

GCS acts a staging area for BigQuery, so when you are using Command-Line to upload to BigQuery, your data will be stored in an intermediate table. The data in the intermediate table will need to be updated for the effect to be shown in the target table. There are two ways to update the target table in BigQuery. I will be explaining both of them below.

1. Update the rows in the final table and insert new rows from the intermediate table
UPDATE final_table t SET t.value = s.value FROM intermediate_data_table s WHERE t.id = s.id;INSERT final_table (id, value) SELECT id, value FROM intermediate_data_table WHERE NOT id IN (SELECT id FROM final_table);

2. Delete all the rows from the final table which are in the intermediate table

DELETE final_table f WHERE f.id IN (SELECT id from intermediate_data_table); INSERT data_setname.final_table(id, value) SELECT id, value FROM data_set_name.intermediate_data_table; 

Limitations and Challenges with writing Custom Code to move data from Microsoft SQL Server to Google BigQuery

 
Businesses need to put systems in place that will enable them to gain the insights they need from their data. These systems have to be seamless and rapid. Using custom ETL scripts has the following limitations that will affect the reliability and speed of these systems:
  1. Writing custom code is only ideal if you’re looking to move your data once from Microsoft SQL Server to BigQuery
  2. Custom ETL code does not scale well with stream and real-time data. You will have to write additional code to update your data. This is far from ideal
  3. When there’s a need to transform or encrypt your data, custom ETL code fails as it will require you to add additional processes to your pipeline
  4. Maintaining and managing a running data pipeline such as this will need you to invest heavily on valuable engineering resources
While writing code to move data from SQL Server to BigQuery looks like a no-brainer in the beginning, the implementation and management are much more nuanced than that. The process a high propensity for errors which will, in turn, have a huge impact on the data quality and consistency. 
AN EASIER WAY TO MOVE DATA FROM SQL Server TO BIGQUERY:

Using a fully managed, easy to use Data Pipeline platform like Hevo, you can load your data from SQL Server to BigQuery in a matter of minutes. You can achieve this on a no-code-required, point and click environment. Here are the steps to replicate SQL Server to BigQuery using Hevo:

  • Connect to your SQL Server
  • Select the replication mode: (i) Full dump and load (ii) Incremental load for append-only data (iii) Incremental load for mutable data
  • Configure your BigQuery Warehouse and move data

With Hevo, you can achieve simple and efficient Data Replication from Microsoft SQL Server to BigQuery. Hevo can help you move data from not just SQL Server but 100s of additional data sources.

Sign up for a 14-Day Free Trial with Hevo and experience a seamless, hassle-free data migration experience from SQL Server to BigQuery.

ETL Data to Redshift, Bigquery, Snowflake

Move Data from any Source to Warehouse in Real-time

Sign up today to get $500 Free Credits to try Hevo!
Start Free Trial

Related Posts