MySQL is an open-source relational database system that finds a place in most enterprise tech stacks worldwide. Even though the recent developments in licensing restrictions imposed by Oracle have caused a slight drop in popularity, it still remains one of the most used structured databases. In the typical architecture, MySQL is used as the transactional database with a separate data warehouse to support the analytical and reporting requirements.
This architecture means that there is a recurring need to move or copy data from MySQL to the data warehouse. In some cases, this copying operation needs to happen in real time because of the criticality of the reports involved. The paradigm called change data capture is the answer for such real-time sync requirements. This post is about the methods to implement MySQL CDC to different target databases. This blog will touch upon all 3 aspects in detail. Additionally, the blog also highlights the pros and cons of each approach so that you can evaluate all options and choose the path that suits you best.
Introduction to MySQL
In 2019, more than 39% of developers worldwide used MySQL, making it the world’s most popular Database. Although it lacks the extensive features provided by PostgreSQL, MySQL is still useful for many applications, such as web applications.
Since it comes standard in the LAMP stack, where LAMP stack is an Open-Source suite of web applications that consists of Linux, Apache HTTP Server, MySQL, PHP; MySQL is the go-to choice for scalable web applications. Let’s discuss some salient features that make MySQL such a great catch.
Key Features of MySQL
- Maintained by Oracle: Oracle owns and maintains MySQL. It also offers premium versions of MySQL with additional services, proprietary plugins, user support, and extensions.
- Long History: MySQL has been around for over 20 years since its first release in 1995.
- Frequent Updates: MySQL is made more robust with frequent updates with new features and security improvements. The latest release is Version 8.0.23 released on 18 January 2021.
- MVCC Features: MySQL recently offered MVCC (Multi-Version Concurrency Control) features.
- A Supportive Community: A devoted community of developers is available to help troubleshoot as and when needed.
- Open-Source: MySQL is also a free and Open-Source Relational Database Management System (RDBMS).
- Users: MySQL is used widely by Google, NASA, Flickr, GitHub, Netflix, Tesla, Twitter, Uber, Wikipedia, YouTube, and Zendesk, to name a few.
MySQL CDC (Change Data Capture) is essential for real-time data replication and synchronization. Try our no-code platform and see how Hevo has helped customers across 45+ countries by offering:
- Real-time data replication with ease.
- CDC Query Mode for capturing both inserts and updates.
- 150+ connectors(including 60+ free sources)
Don’t just take our word for it—listen to our customers, such as Thoughtspot, Postman, and many more, to see why we’re rated 4.3/5 on G2.
Get Started With Hevo for Free
What is CDC?
Change data capture (CDC) is a process that identifies the changes made in data within a database and captures them, thereby delivering them in real time to a downstream process or system. CDC is perfect for modern cloud architectures since it’s a highly efficient way to move data across a wide area network. Since it moves data in real time, it also supports real-time analytics and data science.
Benefits of Using CDC
- Real-time Data Synchronization: This method captures data changes and applies them immediately to keep data current across different systems.
- Efficient ETL Processes: Since only changed data is processed, full table scans become less necessary, speeding up the ETL processes.
- Audit and Compliance: It maintains a historical record of all changes that happen to data. Because of this, audits become easier to deal with and remain compliant with regulations.
- Improved Data Analytics: With better data at a faster speed, decision-making analytically should get better.
- Improved Data Integration: Instead, it eases data integration from various sources because the changes are consistently captured across systems.
What are the Methods to Set up MySQL CDC?
These are the methods you can use to set up MySQL CDC in a seamless fashion:
1. Using MySQL Triggers
The simplest way to implement a change data capture is to use a timestamp column with the table. The timestamp column should be changed every time a row changes. An external script can poll the table and take appropriate actions whenever the timestamp changes from the last access time. The problem with this approach is that developers usually will not have the flexibility to modify existing tables to accommodate CDC logic. The solution for this is MySQL triggers.
Triggers are database hooks that get executed when predefined events happen in the database. Triggers can be implemented to listen to INSERT, UPDATE or DELETE operations. While implementing CDC, triggers are configured to insert a new row into an external table every time an INSERT, UPDATE, or DELETE happens. Let us consider a simple example of how to create a trigger for an INSERT operation. For now, let us assume there is a source table ‘students’ with id, age, and name as columns.
- Create a second table student_cdc to capture the updates as shown below.
CREATE TABLE IF NOT EXISTS student_cdc ( cdc_id BIGINT NOT NULL AUTO_INCREMENT, action varchar(100), id BIGINT, age BIGINT, name VARCHAR(100), last_modified_dt timestamp, PRIMARY KEY (‘cdc_id’) ) ENGINE=InnoDB;
- Create a trigger as below.
CREATE TRIGGER ‘student_insert’ AFTER INSERT ON ‘student’ FOR EACH ROW BEGIN insert into ‘student_cdc’ (‘action’, id, age, name, last_modified_dt) values ('insert', NEW.id, NEW.age, NEW.name,now());
Once this configuration is done, a script can be implemented to poll the second table and take appropriate actions to write the data to the target database.
Even though this is a straightforward approach, there are multiple disadvantages to this.
Integrate MySQL to BigQuery
Integrate MySQL on Amazon RDS to Databricks
Integrate MySQL on Google Cloud SQL to Redshift
2. Using Binary Logs
MySQL binary logs provide a very efficient way to track data changes for MySQL CDC. They contain events that describe the modifications to data. In short, binary logs contain all the information that is required to trace the global status of the server right from the time it was started. To enable binary logs, MySQL server needs to be started with –log-bin option.
The best way to understand how the binary logs works would be to use the mysqlbinlog that can print the binary logs in a user-readable form. There are a number of open-source tools that are built on top of binary log functionality to ease the developer burden in implementing MySQL CDC. A great example is the maxwell tool which reads the binary log and writes to Kafka. Developers can implement Kafka consumers to process the events generated by maxwell and write to appropriate target databases.
Before using these binary log applications, use the below snippet to change the MySQL server configuration to enable binary logs.
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 10
max_binlog_size = 100M
binlog-format = row
Once the server is started with the above configuration, log in to MySQL shell and perform an insert operation.
insert into students (age,nam) values (15,’alex’);
Assuming maxwell is installed, use the below command to view the maxwell output in the terminal.
./bin/maxwell --user=’mysql_maxwell_user’' --password=’maxwell_passwordl' --host='127.0.0.1' --producer=stdout
The output will be as below.
{"database":"school","table":"students","type":"insert","ts":1472937475,"xid":211209,
"commit":true,"data":{"age":15,”name”:”alex”}}
Another great example of a binary log-based application is the python-mysql-replication that is available here. Once installed, this python tool can provide a binary log stream that can be processed to write to target databases. The code to print the stream will be as simple as the below snippet.
from pymysqlreplication import BinLogStreamReader
mysql_settings = {'host': '127.0.0.1', 'port': 3306, 'user': mysql_user, 'passwd': 'mysql_password'}
log_stream = BinLogStreamReader(connection_settings = mysql_settings, server_id=100)
for binlogevent in log_stream:
binlogevent.dump()
log_stream.close()
As simple as it may seem, this is only a very small percentage of the job done. Some of the challenges here are as follows.
- Once the binary log listening is implemented, there is the tedious job of learning the basics of the target database and implementing custom logic to write to the target database.
- Another caveat here is that there needs to be a mechanism to keep track of the extent to which the binary log was processed so that there is some way of restarting the process if something goes wrong.
In comparison to the above approaches, a much better way would be to implement an automated ETL solution like Hevo, that can perform a CDC-based copy operation seamlessly.
Migrate Data from MySQL within Minutes!
No credit card required
What are the Challenges of Using Manual Method to Setup MySQL CDC?
- Complex configuration: Setting up CDC manually requires deep technical knowledge of MySQL logs and replication.
- Performance overhead: Reading binary logs continuously can lead to performance degradation in the MySQL server.
- Data consistency issues: Ensuring no data loss or duplication during real-time changes is challenging.
- Scalability limitations: Managing CDC across large datasets and multiple tables manually is inefficient and error-prone.
What are the Key Advantages of Using MySQL?
MySQL is a lightweight database that developers can install and use on production application servers with large multi-tier applications. Leveraging MySQL as a part of your workflow has several advantages:
- MySQL is a highly flexible and scalable Database Management System.
- MySQL supports features like Master-Slave Replication and scale-out.
- MySQL also supports Offload Reporting and Geographic Data Distribution.
- The MyISAM storage engine has a very low overhead when used for read-only applications.
- Support is provided for frequently used tables for the Memory Storage Engine.
- A query cache exists for repeatedly used statements.
- MySQL is easy to learn and troubleshoot, given the various helpful sources, such as blogs, white papers, and books on the subject.
Conclusion
This blog talks about the different ways you can use to setup MySQL CDC in a seamless fashion namely: MySQL Triggers, Binary Logs, and with the help of a third-party tool, Hevo.
Sign up for a 14-day free trial and streamline your data migration with Hevo!
You can also have a look at our unbeatable Hevo Pricing that will help you choose the right plan for your business needs!
Frequently Asked Questions
1. What is MySQL CDC?
MySQL CDC (Change Data Capture) refers to techniques or tools used to capture and track changes in a MySQL database.
2. What is CDC in SQL?
CDC (Change Data Capture) in SQL refers to a feature or process used to detect and track changes made to data in a SQL database.
3. What is the role of CDC in SQL?
a) Data Replication
b) Real-Time Processing
c) Data Warehousing and ETL
d) Audit and Compliance
e) Event-Driven Architectures
With over a decade of experience, Sarad has been instrumental in designing and developing Hevo's fundamental components. His expertise lies in building lean solutions for various software challenges. Sarad is passionate about mentoring fellow engineers and continually exploring new technologies to stay at the forefront of the industry. His dedication and innovative approach have made significant contributions to Hevo's success.