Azure SQL Replication: A Comprehensive Guide

on Tutorials • September 2nd, 2020 • Write for Hevo

Introduction

Azure SQL is a relational database that is provided as a completely managed service by Microsoft. Built on top of SQL Server, Azure SQL boasts of a very well equipped querying layer and rock-solid stability. This blog will let you carry out Azure SQL replication easily.

The managed instance relieves customers of the effort involved in maintaining an on-premise server. Since SQL Server has traditionally been sold in a license-based model, Azure SQL also needs customers to pay for a subscription license making it slightly expensive than open-source counterparts like MariaDB and PostgreSQL.

Pricing of Azure SQL is in terms of required processor core and memory. It starts at approximately .5 $ per hour for a 2 core and 10 GB instance. Since SQL Server is a favourite on-premise database among enterprises, Azure SQL server finds itself as a top choice while these companies move to a pure-cloud or hybrid model. In most such cases, there will be a need to perform Azure SQL replication.

The simplest of such a use case involves just a source and target database, but it is typical to find replication configuration involving 3 or 4 databases in enterprises. This is mostly used while trying to separate the transactional databases from the ones that take ETL workloads. This post details the methods that can be used to setup replication in Azure SQL.

Pre-requisites

  • An Azure SQL account with permissions to create databases.
  • Basic knowledge of commands in Azure SQL.

You will be looking at the following:

Hevo Data: Migrate your Data Seamlessly

Hevo is a No-code Data Pipeline. It supports pre-built data integrations from 100+ data sources, including Azure SQL. If you want to replicate your SQL data, then Hevo is the right choice for you. Hevo is a fully managed solution for your data migration. It will automate your data flow in minutes. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully-automated solution to replicate and manage data in real-time and always have analysis-ready data in your desired destination. 

Let’s look at some salient features of Hevo:

  • Fully Managed: It requires no maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and maps it to the destination schema.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support call.

Give Hevo a try by signing up for a 14-day free trial today.

Setting Up Replication in Azure SQL 

Setting up replication in Azure SQL can be mainly done through three methods

  1. Using Azure Data Sync
  2. Using transaction replication feature in Azure
  3. Using Hevo – A completely managed ETL tool.

Using Azure SQL Data Sync

Azure SQL Data Sync is a service that helps in syncing data unidirectionally or bidirectionally between multiple Azure databases and SQL server instances. It involves a hub database and member databases. A metadata database in the same region as that of the hub database is also needed. You will now start configuring Data Sync through Azure portal.

  1. Login to Azure portal and select SQL Databases.
  2. Select the database you will be using as the master or the hub database. 
azure sql replication
  1. From the left-hand side menu, select Sync to other Databases.
azure sql replication
  1. Select ‘New Sync group’ and add the details. Select automatic sync as ON for scheduling sync and enter your preferred schedule. In case of conflict resolution, select ‘Hub win’. 
azure sql replication
  1. In the next section, add the configuration details for the member databases.
azure sql replication

If you do not already have a destination database, you can create one from this interface itself. Otherwise, add the details of your destination database. It can take a managed instance or an SQL Server instance.

  1. In the next screen, select the tables that you want to sync from the source database and click ‘Save’. If you have configured a schedule in the second step, your sync will be executed at that time. Otherwise, you can start the Sync manually by clicking the Sync button. 

Azure data sync is an intuitive service that can schedule sync between SQL Server or managed SQL instances in a few clicks. But it comes with a few drawbacks that make it non-ideal in certain situations

  1. The sync interval can only be set to 5 minutes at a minimum. SO you should expect more than 5 minutes of latency in syncing your data.
  2. It is very performance-intensive and comes with a restriction on the number of member databases.
  3. Transactional consistency is not maintained. So there will be intervals of time where your destination is not up to date or reflect a diverged state with respect to your source database. This makes it less than perfect solution for backups as well.

Using Azure Transactional Replication Feature

Azure transactional replication feature eliminates most of the drawbacks mentioned in the case of Data Sync. It is a bit more involved to set up and makes you use the command line. For the sake of simplicity, let us assume, there is a table called ReplicationTest with two columns – ID and name. We will start the process by first setting up a destination database in your managed instance.

  1. Connect to your instance and execute the below T-SQL to create a destination database.
USE [master]
GO
CREATE DATABASE [ReplTran_SUB]
GO
USE [ReplTran_SUB]
GO
CREATE TABLE ReplicationTest (
  ID INT NOT NULL PRIMARY KEY,
  NAME VARCHAR(100) NOT NULL,
  dt1 DATETIME NOT NULL DEFAULT getdate()
)
GO
  1. Connect to the source database and execute the below statements to configure a distributor. This process takes the transactions from the source and feeds it to the destination databases.
USE [master]
GO
EXEC sp_adddistributor @distributor = @@ServerName;
EXEC sp_adddistributiondb @database = N'distribution';
GO 
  1. We will now configure the source database to use the distributor. For this, you need to change the query mode to the SQLCMD mode. You can do this by going to the query menu and clicking the SQL CMD mode. Execute the following commands to configure the source database. 
:setvar username <your_user_name>
:setvar password <your_passsword>
:setvar file_storage "\<file_share_folder_name>"
:setvar file_storage_key "DefaultEndpointsProtocol=https;AccountName=<Accountname>;AccountKey=<Account_key>;EndpointSuffix=core.windows.net"
USE [master]
EXEC sp_adddistpublisher
  @publisher = @@ServerName,
  @distribution_db = N'distribution',
  @security_mode = 0,
  @login = N'$(username)',
  @password = N'$(password)',
  @working_directory = N'$(file_storage)',
  @storage_connection_string = N'$(file_storage_key)'; 
  1. Use the below command to enable replication and configure logs in the source database.
USE [<source_db_name>]
EXEC sp_replicationdboption
  @dbname = N'<source_db_name>',
  @optname = N'publish',
  @value = N'true';
-- Create your publication
EXEC sp_addpublication
  @publication = N'publish_test',
  @status = N'active';
-- Configure your log reader agent
EXEC sp_changelogreader_agent
  @publisher_security_mode = 0,
  @publisher_login = N'<your_user_name>',
  @publisher_password = N'<your_password>',
  @job_login = N'<your_username>',
  @job_password = N'<your_password>';
-- Add the publication snapshot
EXEC sp_addpublication_snapshot
  @publication = N'publish_data',
  @frequency_type = 1,
  @publisher_security_mode = 0,
  @publisher_login = N'<your_username>',
  @publisher_password = N'<your_password>',
  @job_login = N'<your_username>',
  @job_password = N'<your_password>';
  1. Configure the replication database to use the publisher and initialize the snapshot.
-- Adds the ReplTest table to the source table publication
EXEC sp_addarticle
  @publication = N'publish_data',
  @type = N'logbased',
  @article = N'ReplicationTest',
  @source_object = N'ReplicationTest',
  @source_owner = N'<your_schema>';
-- COnfigures and adds the subscriber
EXEC sp_addsubscription
  @publication = N'publish_data',
  @subscriber = N'<destination_server_url>',
  @destination_db = N'<destination_database_name>',
  @subscription_type = N'Push';
-- Configures an agent for data push and subscription.
EXEC sp_addpushsubscription_agent
  @publication = N'publish_data'',
  @subscriber = N'estination_server_url',
  @subscriber_db = N'destination_database_name',
  @subscriber_security_mode = 0,
  @subscriber_login = N'<destination_username>',
  @subscriber_password = N'<destination_password>',
  @job_login = N'<your_username>',
  @job_password = N’<your_password>';
-- Initializing the publishing snapshot. 
EXEC sp_startpublication_snapshot
  @publication = N'publish_data';

The last command should get the replication started. You can test this by adding a row to the source database and verifying that the row has been replicated. In case you face errors, you may need to increase the connection timeout values of your source database.

Drawbacks of this Method

The biggest advantage of this approach is the low latency and transactional consistency. Such capability comes with a few drawbacks too.

  1. The biggest downside is that this is not for the faint-hearted and requires you to meddle with a lot of configurations and commands. This needs database administration skills to set up along with developer skills.
  2. This approach is best suited in case of replication to SQL Server based destination. In many use cases, the replication has to be done to other databases or warehouses.
  3. There is no option for transforming data in this method. Many ETL workflows need data transformation before moving data.

If all of those challenges create a roadblock for you, you should check out a cloud-based completely managed ETL tool like Hevo.

Hevo is a No-code Data Pipeline. It supports pre-built integrations from 100+ data sources at a reasonable price. You can automate your data flow from MySQL in minutes. 

Sign up for a 14-day free trial today and let Hevo be a helping hand in your backup process.

Have any further queries? Get in touch with us in the comments section below.

No-Code Data Pipeline for Azure SQL