With growing volumes of data, is your SQL Server getting slow for analytical queries? Are you simply migrating data from MS SQL Server to Redshift? Whatever your use case, we appreciate your smart move to transfer data from MS SQL Server to Redshift.
This article, in detail, covers the various approaches you could use to perform data replication from MS SQL Server to Redshift.
What is Microsoft SQL Server?
Microsoft SQL Server is one of the world’s most popular and widely used relational database management system. With great security, reliability, and, ease of connecting to multiple applications across a network, SQL Server is a popular choice for OLTP systems.
What is Amazon Redshift?
Amazon Redshift data warehouse is an enterprise-class relational database query and management system. Amazon Redshift achieves efficient storage and optimum query performance through a combination of massively parallel processing, columnar data storage, and very efficient, targeted data compression encoding schemes. Read more about Redshift Architecture here.
Approaches to transfer data from SQL Server to Redshift
There are two approaches to perform Microsoft SQL Server to Redshift replication.
Method 1: A ready to use Hevo Data Integration Platform (7 Days Free Trial).
Method 2: Write custom ETL code using Bulk Export Command-line Utility.
This article covers the steps involved in writing custom code to load data from SQL Server to Amazon Redshift. Towards the end, the blog also covers the limitations of this approach.
Transfer Data from SQL Server to Redshift using Custom Code
As a pre-requisite to this process, you will need to have installed Microsoft BCP command line utility. If you have not installed it, here is the link to download.
For the purpose of demonstration, let us assume that we need to move the ‘orders’ table from ‘sales’ schema into Redshift. This table is populated with the customer orders that are placed on a daily basis.
There might be two cases you will consider while transferring data.
- Move data for one time into Redshift.
- Incrementally load data into Redshift. (when the data volume is high)
Let us look at both scenarios:
One Time Load
You will need to generate the .txt file of the required SQL server table using BCP command as follows :
Open the command prompt and go the below path to run BCP command
C:\Program Files <x86>\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn
Run BCP command to generate the output file of the SQL server table Sales
bcp "sales.orders" out D:\out\orders.txt -S "ServerName" -d Demo -U UserName -P Password -c
Note: There might be several transformations required before you load this data into Redshift. Achieving this using code will become extremely hard. A tool like Hevo, that provides an easy environment to write transformations, might be the right thing for you.
Upload above generated .txt file to S3 Bucket
We can upload files from local machine to AWS using several ways. One simple way is to upload it using file upload utility of S3. This is a more intuitive alternative.
You can also achieve this AWS CLI, which provides easy commands to upload it to S3 bucket from local machine.
As a pre-requisite, you will need to install and configure AWS CLI if you have not already installed and configured it. You can refer to the user guide to know more about installing AWS CLI.
Run the following command to upload the file into S3 from the local machine
aws s3 cp D:\orders.txt s3://s3bucket011/orders.txt
Create Table schema before loading the data into Redshift
CREATE TABLE sales.orders (order_id INT, customer_id INT, order_status int, order_date DATE, required_date DATE, shipped_date DATE, store_id INT, staff_id INT )
After running the above query, a table structure will be created within Redshift with no records in it. To check this, run the following query:
Select * from sales.orders
Using copy command load the data from S3 to Redshift
COPY dev.sales.orders FROM 's3://s3bucket011/orders.txt' iam_role 'Role_ARN' delimiter '\t';
You will need to confirm if the data has loaded successfully. You can do that by running the query.
Select count(*) from sales.orders
This should return the total number of records inserted.
Limitations of the Custom ETL Scripts Approach:
- In cases where data needs to be moved once or in batches only, the custom ETL script method works well. This approach becomes extremely tedious if you have to copy data from MS SQL to Redshift in real-time.
- In case you are dealing with huge amounts of data, you will need to perform incremental load. Incremental load (change data capture) becomes hard as there are additional steps that you need to follow to achieve it.
- Transforming data before you load it into Redshift will be extremely hard to achieve.
- When you write code to extract a subset of data often those scripts break as the source schema keeps changing or evolving. This can result in data loss.
The process mentioned above is frail, erroneous and often hard to implement and maintain. This will impact the consistency and availability of your data into Amazon Redshift.
There is an easier way to replicate data from MS SQL Server to Redshift
A ready to use Hevo Data Integration Platform helps you migrate this data without having to write any code. This is how the process will look like when done through Hevo.
- Connect to your MS SQL Server.
- Select the replication mode: (a) Full Dump and Load (b) Incremental load for append-only data (c) Incremental load for mutable data.
- For each table in MS SQL Server choose a table name in Redshift where it should be copied.
That’s it! You are all set. Hevo will take care of fetching your data in an incremental fashion and will upload that seamlessly from MS SQL Server to Redshift via real-time data pipeline.
Additionally, you can bring data from 100s of different sources – databases, cloud applications, SDKs, and more with Hevo Data Integration Platform. This will future proof your data integration set up and provide the flexibility to immediately replicate data from any source into Redshift.
Sign-up for a 7 Day Free Trial to see how easy it is to load data from MS SQL Server to Redshift (and 100s of other sources) using Hevo.