Debezium on Red hat OpenShift: 2 Easy Steps

on Database Management Systems, Debezium, Docker, MySQL, Openshift, redhat • February 21st, 2022 • Write for Hevo

debezium on redhat openshift - featured image

Debezium is the database monitoring platform that continuously captures and streams all real-time modifications updated on the respective database systems like MySQL and PostgreSQL. Usually, developers use CLI tools like the default command prompt terminal to work with Debezium, which is the traditional way of setting up the Debezium workspace. To begin working with Debezium, you have to set up and start four different services such as Kafka server, Zookeeper instance, Debezium connector service, and Kafka Connect platform. Running Debezium on Red hat OpenShift optimizes solutions.

However, starting all the services separately is a time-consuming process. To eliminate such complications, you can use containerization platforms to run all the above-mentioned services in the form of docker images. One such containerization service is Red Hat OpenShift, which allows you to run and deploy the Docker container images that comprise all the necessary services to get started with a specific application.

In this article, you will learn about Debezium, Red Hat OpenShift, and how to run Debezium on Red hat OpenShift container platform.

Table of Contents

Prerequisites

A fundamental of databases and real-time data streaming.

What is Debezium?

Debezium OpenShift: debezium logo
Image Source

Developed by Randall Hauch, a software developer of Red Hat, Debezium is an open-source and distributed platform exclusively built for implementing the CDC (Change Data Capture) approach. The Debezium platform is built on top of the Apache Kafka ecosystem, which captures and streams real-time changes from external RDBMS applications like Microsoft SQL Server, Oracle, PostgreSQL, and MySQL. Since Debezium is built on Kafka, it streams all the real-time row-level updates and modifications of specific databases into Kafka servers.

Debezium continuously updates the history of changes in Kafka logs inside Kafka servers. It provides you with a set of connectors in which each connector captures real-time changes from the respective databases. For example, the Debezium MySQL connector streams real-time changes from the MySQL database, while the Debezium Oracle connector captures modifications from the Oracle database.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ data sources (including 40+ free data sources) like Asana and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

GET STARTED WITH HEVO FOR FREE[/hevoButton]

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
SIGN UP HERE FOR A 14-DAY FREE TRIAL

What is Red Hat OpenShift

Debezium OpenShift: redhat
Image Source

Developed by Red Hat in 2011, OpenShift is a cloud-based platform that allows you to run containerized applications and workloads. In other words, OpenShift is a cloud-based container PaaS (Platform as a Service) that enables users to develop, deploy, and manage end-to-end applications. Since Red Hat maintains the OpenShift platform at a backend, it provides you with complete support regardless of where you choose to run and deploy your application or workloads. With Red Hat OpenShift, you can flexibly develop, deploy, and scale applications on both on-premises and cloud infrastructure.

Steps to run Debezium on Red Hat OpenShift

1. Debezium on Red Hat OpenShift: Downloading OpenShift CLI

To run Debezium on Red Hat OpenShift, you should satisfy two prerequisites. You should readily have an OpenShift CLI (Command Line Interface) and Docker. 

  • For downloading OpenShift CLI for Debezium on Red Hat OpenShift, visit the Infrastructure Provider page of the Red Hat OpenShift cluster manager site with your Red Hat login ID. Select your appropriate infrastructure provider. Now, in the command line interface section, select your system OS type, in this case, Windows, from the drop-down menu. After choosing the respective OS, click on the “Download command-line tools” button. 
  • Now, the CLI tool is downloaded to your local machine and can be used for the next steps of Debezium on Red Hat OpenShift. Unzip the setup file of the CLI tool and move the “oc” binary file to your preferred file path.
  • After moving the OpenShift CLI tool to your preferred directory, you are now ready to run OpenShift commands for Debezium on Red Hat OpenShift by following the below-given syntax. 
C:> oc <command>
  • Before executing commands for Debezium on Red Hat OpenShift, you have to log in to the oc (OpenShift CLI) for managing the OpenShift cluster. For logging in with oc, you should readily have a cluster in the OpenShift Container Platform.
  • Execute the following command to log in with oc.
$ oc login
  • Then, enter the web URL of the OpenShift Container Platform.
Debezium OpenShift: insecure connection
Image Source
  • In the next step of Debezium on Red Hat OpenShift, you are prompted to decide whether to use insecure connections or not. As shown in the image above, answer with yes or no (Y/N) in your command prompt. 
Debezium OpenShift: username
Image Source
  • Then, enter the respective user name and password of OpenShift CLI. Now, you are successfully logged in with OpenShift CLI.

Following the above-given steps, you successfully installed and configured the OpenShift CLI. In the further steps of Debezium on Red Hat OpenShift, you will use the OpenShift CLI as the command terminal to manage projects and clusters of your OpenShift Container Platform. 

2. Debezium on Red Hat OpenShift: Running Debezium on Red Hat OpenShift

  • Initially, you have to install the necessary operators and templates by cloning the GitHub repository of strimzi. Strimzi provides various operators and templates for managing the Kafka clusters running at any container platforms like OpenShift and Kubernetes.
  • For exporting the preferred Strimzi version for Debezium on Red Hat OpenShift, open your OpenShift CLI and execute the following command.
export STRIMZI_VERSION=0.18.0
  • After exporting, execute the command below to clone the Strimzi operators and templates from the GitHub repository.
git clone -b $STRIMZI_VERSION https://github.com/strimzi/strimzi-kafka-operator
  • Then, change the directory to “strimzi-kafka-operator” by executing the below command.
cd strimzi-kafka-operator
  • Now, switch to the admin user and install the operators and templates for managing the Kafka cluster.
oc login -u system:admin
oc create -f install/cluster-operator && oc create -f examples/templates/cluster-operator
  • In the next step, execute the following command to deploy a Kafka broker cluster.
oc process strimzi-ephemeral -p CLUSTER_NAME=broker -p ZOOKEEPER_NODE_COUNT=1 -p KAFKA_NODE_COUNT=1 -p KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 -p KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1 | oc apply -f -
  • On executing the above command, you created a single instance of Kafka broker and a Kafka topic with a single replication factor. 
  • In the next step, you will create a Kafka Connect docker image containing pre-installed Debezium connectors. Initially, you have to download and extract the respective Debezium connector you want to run on OpenShift. 
  • In this case, since you are running Debezium’s MySQL connector on the OpenShift platform, you have to particularly download the Debezium MySQL connector. 
  • Run the following command to download and extract the Debezium MySQL connector.
curl https://repo1.maven.org/maven2/io/debezium/debezium-connector-mysql/1.8.1.Final/debezium-connector-mysql-1.8.1.Final-plugin.tar.gz tar xvz
  • Now, create a Docker file that contains the base image as Strimzi Kafka. Execute the following command to include the previously downloaded Debezium MySQL connector into the Docker file.
Debezium OpenShift: sql docker
Image Source
  • On running the above command, you created the plugins/debezium directory that contains separate folders for each Debezium connector that you run on the OpenShift platform.
  • Now, build the Debezium image from the previously created Docker file and push it into your own Docker container registry or platform. Execute the following command to implement the pushing process.
export DOCKER_ORG=debezium-community
docker build . -t ${DOCKER_ORG}/connect-debezium
docker push ${DOCKER_ORG}/connect-debezium
  • In the above command, you can replace debezium-community with your docker hub registry name.
  • Now, you can check the status of the running instances from your Docker container. Execute the command given below to view the active status.
oc get pods
  • On executing the above command, you get the output as shown below. 
Debezium OpenShift: running entities
Image Source
  • From the above image, you can conclude that all the instances that belong to Docker containers are running successfully.
  • You can also see the active status of instances from your OpenShift web GUI. 
  • Navigate to the “Pods” view in the OpenShift web console or follow the link to check the status of your instances. 
Debezium OpenShift: my project
Image Source
  • Your OpenShift web console resembles the above image, in which you can see the status of instances and their age (the exact amount of time since when the respective instances were created).
  • Now, you can check whether the Debezium MySQL connector is working properly on the OpenShift platform.
  • In the OpenShift CLI, execute the following command to start the MySQL database.
oc new-app --name=mysql debezium/example-mysql:1.8
  • After starting the database, you have to set the appropriate credentials for the MySQL instance.
oc set env dc/mysql MYSQL_ROOT_PASSWORD=debezium MYSQL_USER=mysqluser MYSQL_PASSWORD=mysqlpw
  • On executing the above-given command, run the oc get pods command to check the status of the newly created MySQL instance.
Debezium OpenShift: mysql instances
Image Source
  • The output will resemble the above image, which confirms your MySQL instance is running successfully.
  • In the next step, you have to register the Debezium MySQL connector to run in the previously created MySQL instance. 
Debezium OpenShift: sql connector
Image Source
  • Now, the Debezium connector is ready to read or capture the change of events from the corresponding Customer table in the Kafka server. Execute the following command to read changes.
oc exec -it broker-kafka-0 -- /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --property print.key=true --topic dbserver1.inventory.customers
  • On running the above command, the change of events captured by the connector is in the format, as shown below, where the output follows the JSON format having the key-value pair.
Debezium OpenShift: sql json
Image Source

On executing the above-mentioned steps, you successfully installed, configured, and executed Debezium on Red Hat OpenShift.

Conclusion 

In this article, you learned about Debezium, Red Hat OpenShift, how to create OpenShift CLI to run Debezium on Red Hat OpenShift platform. This article primarily focused on running Debezium connector instances on OpenShift using the OpenShift CLI. However, you can also use the web GUI or interface of the Red Hat OpenShift platform to install and manage Debezium connectors and operators.

MongoDB and PostgreSQL are trusted source that a lot of companies use as it provides many benefits but transferring data from it into a data warehouse is a hectic task. The Automated data pipeline helps in solving this issue and this is where Hevo comes into the picture. Hevo Data is a No-code Data Pipeline and has awesome 100+ pre-built Integrations that you can choose from.

visit our website to explore hevo[/hevoButton]

Hevo can help you Integrate your data from numerous sources and load them into a destination to Analyze real-time data with a BI tool such as Tableau. It will make your life easier and data migration hassle-free. It is user-friendly, reliable, and secure.

SIGN UP for a 14-day free trial and see the difference!

Share your experience of learning about Debezium on Red Hat OpenShift to MongoDB in the comments section below.

No-code Data Pipeline For Your Data Warehouse