Nowadays, the use of CDC (Change Data Capture) in applications is a MUST. 

Of course, you don’t want to hear that the changes that you made to the database were not reflected in the analytics. This can result from a failure by an hourly or nightly sync job to pull or push the changes made to data on one database server from/to another database server.  

Kafka CDC Postgres is required to perform real-time data analytics on database systems such as PostgreSQL, big joins and aggregations are needed. In most cases, this creates locks because most database systems comply with the ACID properties of transactions and offer good isolation levels. 

The locks may stay in place for long, reducing the performance of the applications in use by live users. Thus, it will make more sense for you to stream data to other organization teams to perform analytics on it. 

    Prerequisites

    • Apache Kafka: This will create a topic to which data changes can be published as they happen.
    • Kafka Connect: This will be used to stream data between PostgreSQL and Apache Kafka. You will use it to define connectors that will help us move data from PostgreSQL in and out of Kafka. 
    • Debezium: This will help us convert WALs (write-ahead logs) into a data stream.

    An Overview Of Kafka

    Any streaming application is characterized by the following capabilities:

    • It can process a stream of records in real-time. 
    • It can publish and subscribe to a stream of records, just like an enterprise messaging system or a message queue. 
    • It can store a stream of records in a durable and fault-tolerant way. 
    Supercharge your Kafka CDC Postgres Data Migration using Hevo!

    Migrating your data from Kafka to Postgres doesn’t have to be complex. Relax and go for a seamless migration using Hevo’s no-code platform. With Hevo, you can:

    1. Effortlessly extract data from 150+ connectors
    2. Tailor your data to Snowflake’s needs with features like drag-and-drop and custom Python scripts.
    3. Achieve lightning-fast data loading into Postgres, making your data analysis-ready.

    Try to see why customers like Slice and Harmoney have upgraded to a powerful data and analytics stack by incorporating Hevo!

    Get Started with Hevo for Free

    What Is CDC (Change Data Capture)?

    1. CDC is the process/technology of identifying and capturing the changes that have been made to a database. The changes can then be applied to some other data repository. 
    2. The changes can also be made available in a format that can be consumed by data integration tools during the ETL (Extract, Transform, and Load) process. 
    3. It reduces the resources that are required for the ETL processes by using the binary log of the source database or trigger functions to ingest the only data that has changed since the last ETL operation rather than the entire database. 

    Methods to Move Data Using Kafka CDC Postgres

    Method 1: Moving Data From Kafka CDC Postgres Manually

    Step 1.1: Start An Instance Of PostgreSQL

    Run the following command:

    docker run — name postgres -p 5000:5432 debezium/postgres

    The above command will start the PostgreSQL database on our system. 

    The -name option helps us specify the name of the database, which is Postgres. The -p option helps us specify the port on which the database will be running. The database will be running on port number 5000. 

    Step 1.2: Start An Instance Of Zookeeper

    Just run the following command:

    docker run -it — name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 debezium/zookeeper

    The command will start Zookeeper on our system for storage of offsets from the consumer. 

    The -it option can also be written as -i -t. The -i option tells docker that you need an interactive session. The -t option tells docker that you need a tty (teletypewriter) attached, showing terminal-related information. The -p option helps us specify the ports that will be used by the Zookeeper instance. 

    Step 1.3: Start An Instance Of Kafka

    Use the following command:

    docker run -it — name kafka -p 9092:9092 — link zookeeper:zookeeper debezium/kafka

    The -it option tells docker that you need an interactive session with tty (teletypewriter) attached, showing terminal-related information. The -name option helps us specify the name of the instance. The -p option helps us specify the port on which the instance will be running. The –link option helps us link the two containers. 

    Step 1.4: Start An Instance Of Debezium

    Simply run the following command:

    docker run -it — name connect -p 8083:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my-connect-configs -e 
    OFFSET_STORAGE_TOPIC=my-connect-offsets -e 
    ADVERTISED_HOST_NAME=$(echo $DOCKER_HOST | cut -f3 -d’/’ | cut -f1 -d’:’) — link zookeeper:zookeeper — link postgres:postgres — link kafka:kafka debezium/connect

    The above command helps us achieve a lot. Other than starting an instance of Debezium, you have used this instance to link our existing containers together. The containers are Postgres, Kafka, and Zookeepers. The purpose of linking the containers is to have them communicate with each other. 

    Step 1.5: Establish A Connection To PostgreSQL

    You can connect to the PostgreSQL database by running the following command:

    psql -h localhost -p 5000 -U postgres

    Step 1.6: Create A Database And Table 

    First, let’s create the database:

    CREATE DATABASE inventory_db;

    The table will have two columns, id (SERIAL data type) and name (VARCHAR data type). The id column will be the primary key for the table. 

    To create the table, run the following command:

    CREATE TABLE my_table(id SERIAL PRIMARY KEY, name VARCHAR);

    Congratulations! You now have a ready setup. Next, you will be creating a connector to the Kafka connector.

    Step 1.7: Create A Connector

    Here is the code to help you register a connector to the Kafka Connect.

    curl -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '
    {
    "name": "inventory_db-connector",
    "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "tasks.max": "1",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "postgres",
    "database.password": "postgres",
    "database.dbname" : "inventory_db",
    "database.server.name": "dbserver1",
    "database.whitelist": "inventory_db",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes.inventory"
    }
    }'

    The code will create a connector and give it the name inventory_db-connector. All the parameters for the connector have been set, including the hostname, port, user, password, database name, etc. Let’s check whether the connector was created successfully!

    Step 1.8: Check Whether The Connector Was Created

    Use the following command:

    curl -X GET -H "Accept:application/json" localhost:8083/connectors/inventory_db-connector

    Now that you have a ready connector, you can start a Kafka console and begin to watch the changes.

    Step 1.9: Launch Kafka Console Consumer And Watch Changes 

    Just run the command given below:

    docker run -it — name watcher — rm — link zookeeper:zookeeper debezium/kafka watch-topic -a -k dbserver1.public.my_table

    And you’re done!

    Integrate Kafka to PostgreSQL
    Integrate Kafka to MySQL
    Integrate PostgreSQL to PostgreSQL

    Method 2: Kafka CDC Postgres Using Hevo

    Step 2.1: Set Up Postgres as your Source

    Kafka CDC Postgres: Set Up Postgres as Source

    Hevo Data, a No-code Data Pipeline can achieve CDC easily in minutes. You would not need a working knowledge of any of the previously mentioned applications like Zookeeper and Kafka. If you want to automate the process of the CDC in your PostgreSQL database, Hevo is the best product for you. They also offer pre-built integration to 150+ data sources.

      More Reasons To Try Hevo

      • Data Transformation: Hevo provides a simple interface to cleanse, modify, and transform your data through drag-and-drop features and Python scripts. It can accommodate multiple use cases with its pre-load and post-load transformation capabilities.
      • Schema Management: With Hevo’s auto schema mapping feature, all your mappings will be automatically detected and managed to the destination schema.
      • Scalable Infrastructure: With the increase in the number of sources and volume of data, Hevo can automatically scale horizontally, handling millions of records per minute with minimal latency.
      • Transparent pricing: You can select your pricing plan based on your requirements. You can adjust your credit limits and spend notifications for increased data flow.
      • Live Support: The support team is available round the clock to extend exceptional customer support through chat, email, and support calls.

      Conclusion

      In this article, you launched instances of PostgreSQL, Zookeeper, Kafka, and Debezium. You also connected to the PostgreSQL database and created a new database for monitoring.

      You created a connector using Kafka Connect and verified whether the connector was created successfully. Finally, you launched a Kafka Console Consumer for watching changes as they happen in real-time. 

      Sign up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs. 

        Frequently Asked Questions

        1. Does PostgreSQL support CDC?

        Yes, PostgreSQL supports Change Data Capture (CDC) through logical replication and third-party tools.

        2. What is the CDC pipeline in Postgres?

        A CDC pipeline in Postgres captures changes (inserts, updates, deletes) in the database and streams them to other systems for processing or storage.

        3. Does Kafka support CDC?

        Yes, Kafka supports CDC through connectors like Debezium, which captures changes from databases and sends them to Kafka topics.

        Nicholas Samuel
        Technical Content Writer, Hevo Data

        Nicholas Samuel is a technical writing specialist with a passion for data, having more than 14+ years of experience in the field. With his skills in data analysis, data visualization, and business intelligence, he has delivered over 200 blogs. In his early years as a systems software developer at Airtel Kenya, he developed applications, using Java, Android platform, and web applications with PHP. He also performed Oracle database backups, recovery operations, and performance tuning. Nicholas was also involved in projects that demanded in-depth knowledge of Unix system administration, specifically with HP-UX servers. Through his writing, he intends to share the hands-on experience he gained to make the lives of data practitioners better.