MySQL BinLog Based Replication: How to Set Up Real-time Data Load

on Tutorials • April 9th, 2020 • Write for Hevo

Are you looking for a step by step approach to set up MySQL BinLog based replication? If yes, you have landed on the right blog. Before we get to the steps, let’s first take a look at what is MySQL BinLog based replication and the need for it.

Why do we use MySQL replication?

In business, there are many instances where MySQL databases have to be replicated in real-time. Here are some examples:

  • Data Warehousing – A typical use case is when a separate database other than the transactional database is used for warehousing. 
  • Load Distribution & Data Scaling – Distributing the load among multiple slaves to improve performance.
  • Data Backup – Since the data is replicated, it is possible to take back-up and ensure the security of the system. The multiple replications ensure the recovery of important data in case if a node is compromised due to some reason. This can be accomplished in highly critical domains like finance or banking however there are extra measures that need to be taken to maintain data consistency along with additional storage.
  • Data Analytics – The replicated data on slave servers is used to perform analysis & derive business insights. Our article will mainly evolve around this use-case time-to-time. 

Based on business requirements, you can replicate all databases, selected databases, or even selected tables within a database.

Methods to Set up MySQL Replication Using BinLog

  1. Building Custom Code Using Binary Log Files – This method works based on the data update events. These events are captured in the binary log file which is created on the master server. The critical aspect of this method is that the log files, positions in them and data update events to be transferred real-time between master and slave.
  2. Using an Automated Data Pipeline, Hevo Data – This is a no-code alternative to achieve the same outcome without having to invest in engineering bandwidth. Hevo can help you set up MySQL BinLog based replication instantly on a point and click interface.

This blog focuses on describing BinLog based replication in great detail. Let’s get started.

MySQL BinLog Based Replication: Steps to Configure

What is a Log File?

It is a file in a database that records the changes made to the database. A single change is known as a log message. For example, an UPDATE to a record would be a log message.

The events contain the actual data that should be carried from the master to the slave.

In this, a MySQL source instance that operates as the master writes database record updates and changes events or logs to the binary log.

The events or logs are recorded in different formats in the binary log file. The logging formats could be statement-based, row-based or mix of both.

Once the log files are available for slaves to be processed, the slaves are put together in such a way to read the binary log from the master and to carry out the events on the slave’s local (destination) database.

A copy of the binary log file is sent from the master to all slaves that are configured in the system. As per the requirements & business logic, it can be decided which statements need to be executed from the binary file. All the events can be executed too.

Depending on the database structure & tables, the slaves or destinations are also configured to process only the specific events as per need. The binary log coordinates aspects such as file name, processed parts (position) of the file are recorded on each slave.

Multiple slaves can be configured to execute different parts of the same log file. With this, the slave can be controlled & the execution is performed independently without affecting the operations of master & other slaves. For example – you can disconnect, connect or resume operations of a slave independently.

The configuration between the master & each slave is to be done with a unique ID using the server_id variable. Each slave should also have the details about master hostname, log file name & position within that file. The master info repository of the slave is used to save these details where it could be either a file or a table. In cases, using the CHANGE MASTER TO statement on the slave can be helpful.  

MySQL Replication Setup

The steps to set up MySQL Binlog based replication are as follows:

Step 1: Enable Binary Logging and Configure Master

This step is about enabling binary log and configuring a unique server ID on the master. This needs to be accomplished using log-bin & server-id options available in the configuration file at {mysqld] section. Each server whether it is a master or slave of the replication structure needs to be configured with a unique server id.

  1. You will need to shut down the MySQL server to edit the my.cnf or my.ini file.
  2. Here is how you can add log-bin & server-id options to enable the binary logging

     

    [mysqld]
    
    log-bin=mysql-bin
    
    server-id=1
  3. Once the changes are done to the configuration file, you will need to restart the server.

Step 2: User Creation for Replication

We will need to have the slave user accounts defined on the master server by granting ‘replication slave’ permissions. 

  1. To create a new user account, use the CREATE USER command.

     

    mysql> CREATE USER ‘username’@’%.domain.com’ IDENTIFIED BY ‘password’;
  2. Newly or existing created accounts should have replication privileges granted

     

    mysql> GRANT REPLICATION SLAVE ON *.* TO ‘username’@’%. domain.com’;

Step 3: Getting Binary Log Coordinates from the Replication Master

To start the replication process, we should know the master’s current coordinates within its binary log to configure the slave.

To find the master binary log coordinates, follow below steps:

  1. Get started with a session on the master. With the command-line client, execute the FLUSH TABLES WITH READ LOCK statement:
    mysql> FLUSH TABLES WITH READ LOCK;

    This will flush all tables and block write statements.

  2. Execute the SHOW MASTER STATUS statement in another session to obtain the current binary log file name and position:

     

    mysql>SHOW MASTER STATUS;

These are replication coordinates in the form of bin log file & position as shown in the above example. These values will need to be used while configuring the slaves.

Step 4: Creating a Data Snapshot

We can create a dump of all the tables and or databases at the master server and import the same on the slave by using mysqldump. This will need to be performed before the replication task gets started.

In this command, the data is dumped into the file ‘repl_dbdump.db’ with –master-data option:

shell> mysqldump --all-databases --master-data > repl_dbdump.db
  1. If you want to exclude any tables in the database, use –ignore-table option.
  2. If you want to name only specific databases which you want to be dumped, use the –databases option

Step 5: Replication Slave Configuration

Similar to the master, the slaves will also need to be set up with a unique server id so that it does not conflict. The same process will need to be followed to edit the configuration file as indicated in Step 1 with regard to server id.

For example:

[mysqld]

server-id=2

Once you are done with making the changes to the configuration file, restart the respective slave. There is no need to activate binary logging on the slave but if you want to, then it can also be used as a master to other slaves.

Step 6: Master Configuration on the Slave

In order to configure the slave to communicate with the master for replication, set up the slave with the required connection information. The following statements will need to be executed to accomplish the same.

mysql> CHANGE MASTER TO

-> MASTER_HOST='master_host_name',

-> MASTER_USER='replication_user_name',

-> MASTER_PASSWORD='replication_password',

-> MASTER_LOG_FILE='recorded_log_file_name',

-> MASTER_LOG_POS=recorded_log_position;

The following step would be needed depending on whether you have a dump to import or not.

Step 7: Replication Setup between a New Master and Slaves

If you have no snapshot of a previous database to import, then a new master is to be used for the replication.

By starting the slave & executing CHANGE MASTER TO statement, you need to set up replication between a new master and the slave.

Next, perform the above step on each slave.

Step 8: Replication Setup with Existing Data

When setting up replication with existing data, the snapshot needs to be sent from the master to the slave. This has to be performed before the replication process is started.

By considering that we have used mysqldump

  1. Start the slave, using the –skip-slave-start option. This will not trigger the replication
  2. Import the dump file:

     

    shell> mysql < fulldb.dump

With this, the slave connects to the master to replicate data updates.

The data available on slaves can be utilized continuously for analytics purposes. With this approach, the binary log will need to be observed for any changes in the data as per the events and replicate the only those events based on the position id.

We can also set up a scheduling job that notifies each time the data is updated and ensures fresh data is available for analytics purposes. 

Limitations of Setting up MySQL BinLog Based Replication Manually:

There are some limitations of Binary Log Based Replication:

  • Manually configuring & writing customs scripts becomes effort-intensive. Like explained above, this approach needs many tedious steps to work hand in hand. 
  • The process described above is time-consuming. If you are working on projects with tight deadlines, then building the entire set up manually is not advised.
  • Since each step is manually coded, the system is quite brittle. You will need to continuously monitor the infrastructure and ensure that the data replicated is consistent and accurate.
  • In case you need to apply any data enrichment or transformation before replicating the data to your destination, then you will need to build additional code to achieve this. If you are replicating data from MySQL to a data warehouse like Redshift, BigQuery or Snowflake, for data analytics purposes, this can be a huge bottleneck.

An alternative to going through all these hardships will be to use a cloud-based data automated pipeline tool like Hevo which can execute such replications in a matter of a few clicks. 

An Easier Approach – MySQL BinLog Based Replication Using Hevo

Hevo is a completely self-serve, fully managed, automated data pipeline that can help you implement MySQL BinLog based replication without having to write any code. Hevo’s point and click interface ensures the lowest time to production possible.

Here’s how simple it to set up MySQL Data Replication with Hevo:

  • Authenticate and connect your MySQL data source
  • Select the replication mode as “BinLog based replication”
  • Point to the destination data warehouse or database where you want to move data

Hevo supports MySQL change data capture out of the box and will ensure that your data is loaded from MySQL to your target database or data warehouse in real-time. Experience the power and simplicity of implementing MySQL Binlog based replication by signing up for a 14-day free trial with Hevo.

Hevo provides a greatly simplified user interface that can set up the above replication in a matter of a few clicks without the need for the developer to be an expert.

Additionally, Hevo helps you clean, transform and enrich data both before and after replicating the database, ensuring that you have analysis-ready data in your warehouse at any point.

What are your thoughts about setting up MySQL BinLog based replication? Let us know in the comments?

No-code Data Pipeline for MySQL