Database administrators used to capture changes by getting the access control rights and then string such changes to the source databases manually. But, they could not track the real-time changes due to an increase in the amount of modification in the digital world. The real-time changes of databases can be tracked by one of the most well-known approaches called Change Data Capture (CDC).
The CDC approach’s primary purpose is to indicate the changes in databases. Today, Debezium is used for event streaming of real-time changes in the databases. Debezium leverages several connectors to record the changes based on the type of databases being tracked. One such connector is the Debezium Oracle connector that is used for tracking changes in databases on the Oracle server.
Table of Contents
- What is Oracle?
- What is Debezium Oracle Connector?
- Getting started with Debezium Oracle Connector
- Steps to Capture Oracle Database with Debezium
What is Oracle?
Oracle is a relational database management system owned by Oracle corporation to enable object-oriented features like inheritance, polymorphism, and user-defined data types. As a result, Oracle is also known as an object-relational database management system. It is a platform-independent database that can run on any operating system.
For instance, if the Oracle database is running on the Linux platform, it can connect to an application on the Windows platform. Besides, Oracle has a recovery manager that permits the database administrator to perform database backups and point-in-time recovery. Oracle also enables clustering, called Oracle Real Application clusters (RAC), which provides high and reliable performance even if any server in the cluster fails.
What is Debezium Oracle Connector?
Debezium is an open-sourced, distributed event streaming platform by Red Hat that tracks changes in databases, generates events from these changes, and then sends them to applications. It uses CDC, which is a Change Data Capture approach that keeps an eye on the real-time changes of databases. Debezium consists of different connectors that connect to databases and indicate changes.
There are different connectors for different databases in Debezium like MySQL, SQL, PostgreSQL, Oracle, etc. One such connector is the Debezium Oracle connector. It collects the row-level changes of the oracle databases and generates events from them. After that, Debezium Oracle connectors send such events to the Kafka topic. These topics are then consumed by applications to further process desired tasks.
When you start the Debezium Oracle connector for the first time, it performs a consistent snapshot of the database to check the entire history. You can see this by setting the mode for Debezium Oracle Connector through snapshot.mode. After the snapshot is completed, the Debezium Oracle connector starts streaming from its last snapshots. The Debezium Oracle Connector reads from the current system change position (SCN) from the redo log. A redo log refers to the record of data used to reconstruct all the changes made to databases, including the undo changes.
Simplify Oracle ETL with Hevo’s No-code Data Pipeline
A fully managed No-code Data Pipeline platform like Hevo helps you integrate data from 100+ data sources (including 40+ Free Data Sources) like Oracle to a destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources provides users with the flexibility to bring in data of different kinds, in a smooth fashion without having to code a single line.GET STARTED WITH HEVO FOR FREE
Check Out Some of the Cool Features of Hevo:
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
- Connectors: Hevo supports 100+ Integrations from sources like Oracle to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Amazon Redshift, Firebolt, Snowflake Data Warehouses; Databricks, Amazon S3 Data Lakes, MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL databases to name a few.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ sources, that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Simplify your Data Analysis with Hevo today!SIGN UP HERE FOR A 14-DAY FREE TRIAL!
Getting started with Debezium Oracle Connector
The redo logs on the Oracle server do not retrieve the complete history of databases. Therefore, the Debezium Oracle connector will not get the entire history from the redo logs. You will have to enable the connector to create a baseline of the current state of the database. This will help the connector in taking a continuous snapshot of the initial schema of the database.
When the connector is in its default mode, the initial snapshot will perform the following tasks:
- Identify the tables to be captured.
- It will obtain a ROW SHARE MODE, i.e., a lock on the monitored tables. This lock prevents any structural changes from occurring while taking the snapshots.
- Then, it will read the current system number (SCN) from the position server of the redo log.
- It will note the structure of all the relevant tables and release the lock that it applied earlier.
- Now, it will scan all the relevant tables as valid at the SCN position. The snapshot will then generate each event for each row in a table and write the event record to the tables-specific Kafka topic.
- Eventually, it will store all the completed snapshots in the connector offsets.
The data gained through connectors in the snapshot may be lost or incomplete. Therefore, to provide a mechanism for recapturing such table data, Debezium uses the ad-hoc snapshot. Ad-hoc snapshots function through signaling tables. You can start the snapshot by sending a request to the Debezium signaling table. Ad-hoc snapshots specify the tables that are included in the snapshot. It can either capture the entire content of the database or some content in the database.
Steps to Capture Oracle Database with Debezium
If you have a customers table in the Oracle database, you can capture the changes in the database instantly using CDC. And then monitor such changes in real-time to analyze data. You require an Oracle 12c configuration that is running on a GCP instance.
The below output gives the setup information about oracle.
Connect Oracle with Debezium in the following steps.
- Step 1: Create a database.
- Step 2: Give the name to the database.
- Step 3: Check the summary of the configuration of the database.
- Step 4: To bring Debezium to this instance, you need to create the following settings in the configuration file of the Oracle database.
- Step 5: Once the settings are done, it will show the following output in the terminal.
- Step 6: If you want to check whether you have enabled the replication to the database, use the following command.
show parameters enable_goldengate_replication
- Step 7: It will show the below output in the terminal if you have enabled replication.
- Step 8: If you have not enabled the replication, you can use the following commands:
alter system set enable_goldengate_replication = true;
Debezium can intake data from the database by accessing the XStream API. You will use tablespace for the user to capture data. A tablespace is a place where the table is stored in Oracle. You can create the tablespace with the following command.
- Step 9: Create an XStream administrator user and also a user for the Debezium connector in the container database.
- Step 10: You need to allow permission for the admin user to capture data changes. You can do it with the following command.
- Step 11: Create a user that Debezium will use to connect to the Oracle database.
- Step 12: Create an outbound XStream server with the right privileges through the following commands.
- Step 13: Create a customer table that has to be monitored by the connectors.
It can be created by the following set of queries.
The prerequisites to use Oracle database connectors are:
- Kafka: Apache Kafka is the community that streams events. It is also called a large-scale message processing platform. Kafka tends to perform with better throughput, inherent fault tolerance, and built-in partitioning.
- Kafka Connect: Kafka Connect consists of ready-to-use components. Hence, it becomes scalable and reliable for the systems. It has its JDBC connectors that are used to import data to the external systems to the Kafka topics.
- Oracle Instant Client: You can download the Oracle Instant Client through the official website.
- Step 14: Extract into a directory and then export the environment variables through the following command.
export LD_LIBRARY_PATH = /path/to/instant_client /
- Step 15: Register the Debezium Oracle connectors instance using the configuration through the following command.
- Step 16: Use the curl command to register instance settings to find a proper configuration. It should have the following output.
- Step 17: Start the Kafka Connect through the following command:
cd /opt/kafka bin/connect-standalone.sh config/connect-standalone.properties config/debezium-oracle-connector.properties
- Step 18: Execute the commit command. In the database, the connector running should show the change on the log.
- Step 19: Configure the log to output in DEBUG mode. You can configure it with the following commands:
- Step 20: Change the VALUE to DEBUG.
log4j.rootLogger=DEBUG, stdout, connectAppender
- Step 21: When the connector shows in the log that it has captured changes in the database, the messages will be published in the Kafka topic. It will be in the form of:
(database.server.name + table.whitelist) ie hostname_of_database.DEBEZIUM.CUSTOMERS:
You can see the output in the following:
- Step 22: To view the changes in the specific topic, execute the following commands.
kafka-console-consumer.sh — bootstrap-server localhost:9092 — topic TOPIC_NAME — from-beginning
- Step 23: You can go through the list of topics through the following commands to view all the topics:
kafka-topics.sh — list — bootstrap-server localhost:9092
You have completed the setting up of the Debezium Oracle Connectors.
In this tutorial, you learned the configuration and setup of the Debezium Oracle database. There are mainly two types of Debezium connectors. One is with the embedded connectors, and the other is with Kafka Connect. Besides, there are two such types of connectors: sink and source connectors. This tutorial used the sink connectors and Kafka Connect, not the embedded Debezium connectors. However, you can explore other ways to connect and incorporate in your applications for processing in real-time.
Extracting complex data from a diverse set of data sources to carry out an insightful analysis can be challenging, and this is where Hevo saves the day! Hevo offers a faster way to move data from 100+ Data Sources including Databases or SaaS applications like Oracle into your Data Warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.VISIT OUR WEBSITE TO EXPLORE HEVO
Want to take Hevo for a spin?SIGN UP and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.