Nowadays, the use of CDC (Change Data Capture) in applications is a MUST.
Of course, you don’t want to hear that the changes that you made to the database were not reflected in the analytics. This can result from a failure by an hourly or nightly sync job to pull or push the changes made to data on one database server from/to another database server.
Kafka CDC Postgres is required to perform real-time data analytics on database systems such as PostgreSQL, big joins and aggregations are needed. In most cases, this creates locks because most database systems comply with the ACID properties of transactions and offer good isolation levels.
The locks may stay in place for long, reducing the performance of the applications in use by live users. Thus, it will make more sense for you to stream data to other organization teams to perform analytics on it.
-
Prerequisites
- Apache Kafka: This will create a topic to which data changes can be published as they happen.
- Kafka Connect: This will be used to stream data between PostgreSQL and Apache Kafka. You will use it to define connectors that will help us move data from PostgreSQL in and out of Kafka.
- Debezium: This will help us convert WALs (write-ahead logs) into a data stream.
An Overview Of Kafka
Any streaming application is characterized by the following capabilities:
- It can process a stream of records in real-time.
- It can publish and subscribe to a stream of records, just like an enterprise messaging system or a message queue.
- It can store a stream of records in a durable and fault-tolerant way.
Migrating your data from Kafka to Postgres doesn’t have to be complex. Relax and go for a seamless migration using Hevo’s no-code platform. With Hevo, you can:
- Effortlessly extract data from 150+ connectors.
- Tailor your data to Snowflake’s needs with features like drag-and-drop and custom Python scripts.
- Achieve lightning-fast data loading into Postgres, making your data analysis-ready.
Try to see why customers like FairMoney and Harmoney have upgraded to a powerful data and analytics stack by incorporating Hevo!
Get Started with Hevo for Free
What Is CDC (Change Data Capture)?
- CDC is the process/technology of identifying and capturing the changes that have been made to a database. The changes can then be applied to some other data repository.
- The changes can also be made available in a format that can be consumed by data integration tools during the ETL (Extract, Transform, and Load) process.
- It reduces the resources that are required for the ETL processes by using the binary log of the source database or trigger functions to ingest the only data that has changed since the last ETL operation rather than the entire database.
Methods to Move Data Using Kafka CDC Postgres
Method 1: Moving Data From Kafka CDC Postgres Manually
Step 1.1: Start An Instance Of PostgreSQL
Run the following command:
docker run — name postgres -p 5000:5432 debezium/postgres
The above command will start the PostgreSQL database on our system.
The -name option helps us specify the name of the database, which is Postgres. The -p option helps us specify the port on which the database will be running. The database will be running on port number 5000.
Step 1.2: Start An Instance Of Zookeeper
Just run the following command:
docker run -it — name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 debezium/zookeeper
The command will start Zookeeper on our system for storage of offsets from the consumer.
The -it option can also be written as -i -t. The -i option tells docker that you need an interactive session. The -t option tells docker that you need a tty (teletypewriter) attached, showing terminal-related information. The -p option helps us specify the ports that will be used by the Zookeeper instance.
Step 1.3: Start An Instance Of Kafka
Use the following command:
docker run -it — name kafka -p 9092:9092 — link zookeeper:zookeeper debezium/kafka
The -it option tells docker that you need an interactive session with tty (teletypewriter) attached, showing terminal-related information. The -name option helps us specify the name of the instance. The -p option helps us specify the port on which the instance will be running. The –link option helps us link the two containers.
Step 1.4: Start An Instance Of Debezium
Simply run the following command:
docker run -it — name connect -p 8083:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my-connect-configs -e
OFFSET_STORAGE_TOPIC=my-connect-offsets -e
ADVERTISED_HOST_NAME=$(echo $DOCKER_HOST | cut -f3 -d’/’ | cut -f1 -d’:’) — link zookeeper:zookeeper — link postgres:postgres — link kafka:kafka debezium/connect
The above command helps us achieve a lot. Other than starting an instance of Debezium, you have used this instance to link our existing containers together. The containers are Postgres, Kafka, and Zookeepers. The purpose of linking the containers is to have them communicate with each other.
Step 1.5: Establish A Connection To PostgreSQL
You can connect to the PostgreSQL database by running the following command:
psql -h localhost -p 5000 -U postgres
Step 1.6: Create A Database And Table
First, let’s create the database:
CREATE DATABASE inventory_db;
The table will have two columns, id (SERIAL data type) and name (VARCHAR data type). The id column will be the primary key for the table.
To create the table, run the following command:
CREATE TABLE my_table(id SERIAL PRIMARY KEY, name VARCHAR);
Congratulations! You now have a ready setup. Next, you will be creating a connector to the Kafka connector.
Step 1.7: Create A Connector
Here is the code to help you register a connector to the Kafka Connect.
curl -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '
{
"name": "inventory_db-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "postgres",
"database.password": "postgres",
"database.dbname" : "inventory_db",
"database.server.name": "dbserver1",
"database.whitelist": "inventory_db",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory"
}
}'
The code will create a connector and give it the name inventory_db-connector. All the parameters for the connector have been set, including the hostname, port, user, password, database name, etc. Let’s check whether the connector was created successfully!
Step 1.8: Check Whether The Connector Was Created
Use the following command:
curl -X GET -H "Accept:application/json" localhost:8083/connectors/inventory_db-connector
Now that you have a ready connector, you can start a Kafka console and begin to watch the changes.
Step 1.9: Launch Kafka Console Consumer And Watch Changes
Just run the command given below:
docker run -it — name watcher — rm — link zookeeper:zookeeper debezium/kafka watch-topic -a -k dbserver1.public.my_table
And you’re done!
Integrate Kafka to PostgreSQL
Integrate PostgreSQL to PostgreSQL
Method 2: Kafka CDC Postgres Using Hevo
Step 2.1: Set Up Postgres as your Source
Hevo Data, a No-code Data Pipeline can achieve CDC easily in minutes. You would not need a working knowledge of any of the previously mentioned applications like Zookeeper and Kafka. If you want to automate the process of the CDC in your PostgreSQL database, Hevo is the best product for you. They also offer pre-built integration to 150+ data sources.
More Reasons To Try Hevo
- Data Transformation: Hevo provides a simple interface to cleanse, modify, and transform your data through drag-and-drop features and Python scripts. It can accommodate multiple use cases with its pre-load and post-load transformation capabilities.
- Schema Management: With Hevo’s auto schema mapping feature, all your mappings will be automatically detected and managed to the destination schema.
- Scalable Infrastructure: With the increase in the number of sources and volume of data, Hevo can automatically scale horizontally, handling millions of records per minute with minimal latency.
- Transparent pricing: You can select your pricing plan based on your requirements. You can adjust your credit limits and spend notifications for increased data flow.
- Live Support: The support team is available round the clock to extend exceptional customer support through chat, email, and support calls.
Work with Kafka and Postgres Seamlessly using Hevo!
No credit card required
Conclusion
In this article, you launched instances of PostgreSQL, Zookeeper, Kafka, and Debezium. You also connected to the PostgreSQL database and created a new database for monitoring.
You created a connector using Kafka Connect and verified whether the connector was created successfully. Finally, you launched a Kafka Console Consumer for watching changes as they happen in real-time.
Sign up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.
Frequently Asked Questions
1. Does PostgreSQL support CDC?
Yes, PostgreSQL supports Change Data Capture (CDC) through logical replication and third-party tools.
2. What is the CDC pipeline in Postgres?
A CDC pipeline in Postgres captures changes (inserts, updates, deletes) in the database and streams them to other systems for processing or storage.
3. Does Kafka support CDC?
Yes, Kafka supports CDC through connectors like Debezium, which captures changes from databases and sends them to Kafka topics.
Nicholas Samuel is a technical writing specialist with a passion for data, having more than 14+ years of experience in the field. With his skills in data analysis, data visualization, and business intelligence, he has delivered over 200 blogs. In his early years as a systems software developer at Airtel Kenya, he developed applications, using Java, Android platform, and web applications with PHP. He also performed Oracle database backups, recovery operations, and performance tuning. Nicholas was also involved in projects that demanded in-depth knowledge of Unix system administration, specifically with HP-UX servers. Through his writing, he intends to share the hands-on experience he gained to make the lives of data practitioners better.