Oracle CDC to Kafka: Move Data Instantly

on Data Integration, ETL, Tutorials • June 15th, 2020 • Write for Hevo

This post is about setting up a real-time stream from Oracle to Kafka i.e. Oracle CDC to Kafka. Recent versions of Oracle recommend using Oracle GoldenGate software for everything related to the replication of Oracle data. The goal of this post is to help you set up a real-time Kafka stream of changes in the Oracle database using GoldenGate.

Here is an overview of what the post will cover

  1. Introduction to Oracle
  2. Introduction to Kafka
  3. Prerequisites to Data Transfer
  4. Oracle CDC to Kafka using Oracle GoldenGate
  5. Oracle CDC using Hevo Data

Introduction to Oracle

Oracle database is known for its ability to run different kinds of workloads including transactional and data warehouse loads. Designed in the on-premise era of software systems, Oracle is still a popular choice for enterprises because of its flexibility and supporting ecosystem. Along with the database, Oracle provides a suite of products to manage and monitor every step of a typical ETL flow making it a common choice in the data engineering world. Oracle has been improving the database steadily with the changing landscape and now even provides completely managed instances in the cloud. 

Introduction to Kafka

In a real-time ETL pipeline, it is a typical requirement to have the changes in the transactional database streamed in real-time for transformations and then loaded to a data warehouse or serving layer. A streaming platform is usually used in such cases to ensure fault tolerance and stability of the real-time pipeline. Kafka is such a streaming platform that is very popular because of its distributed nature and community support. It provides a high-performance low latency mechanism for handling real-time data feeds. 

Methods for Oracle CDC

Two popular approaches to Oracle CDC are:

  1. Oracle CDC to Kafka using Oracle GoldenGate
    This approach for achieving Oracle CDC to Kafka could be adopted when you have a team of engineers with basic knowledge of databases and message brokers. You would also need to allow sufficient time for this activity.
  2. Oracle CDC using a No-code Data Pipeline, Hevo
    Using Hevo, the data transfer from Oracle CDC could be achieved in a matter of minutes. As Hevo is a fully managed solution, code maintenance and monitoring are things of the past. Sign up for a 14-day free trial now!

Method 1: Oracle CDC to Kafka using Oracle GoldenGate

Oracle GoldenGate for Big Data is Oracle’s solution for streaming transaction data to big data platforms. GoldenGate can be used to read the data changes and write to a Kafka topic that is named after the table in which changes are being made. GoldenGate works on the basis of the extract and ‘replicat’ processes. Making this pipeline work is all about correctly configuring the extract process in the Oracle GoldenGate and the replicat process in Oracle GoldenGate for Big Data. This section will take you through how to achieve Oracle CDC to Kafka using GoldenGate.

Oracle CDC to Kafka: GoldenGate for Big Data

Prerequisites

  • Oracle database with GoldenGate and GoldenGate for Big Data installed and setup. Follow this link for more details.
  • A Kafka instance.
  • Oracle administrator user account – A SYDBA account. Learn more here.
  • Basic knowledge of databases and message brokers. 

Configuring Golden Gate to Kafka replication

1. Connect to GoldenGate

The first step for setting up Oracle CDC to Kafka involves connecting to your GoldenGate instance. Use the ggsci terminal and login to the database in the Oracle GoldenGate command interface. Note that Oracle GoldenGate and Oracle GoldenGate for Big Data are two separate installations and we are accessing the former here. 

DBLOGIN USERID SYSTEM PASSWORD password

2. Configure the Extract Process

The next step is to add an extract group in the GoldenGate instance and then start the extract process. Execute the below commands to configure the extract process.

ADD EXTRACT KAFKA_EXT, TRANLOG, BEGIN NOW

ADD EXTTRAIL C:OGG10Gdirdatet, EXTRACT KAFKA_EXT

The above command creates an extract group called kafka_ext. 

3. Edit the Configuration

You will now edit the configuration for this extract group. For this use the below command to open the config file.

 EDIT PARAM KAFKA_EXT

Add the below entries to the file:

EXTRACT KAFKA_EXT
USERID system@localhost:1521/TD1, PASSWORD password
EXTTRAIL C:OGG10Gdirdatet
TABLE tr.customers;

The above configuration means that we are logging into TD1 database and we will be creating an extract from the table tr.customers.

4. Start the Extract Process

START KAFKA_EXT

Using the above command, you can start the extract process. The next step is to configure a replicat for this extract using the Oracle GoldenGate for big data.

5. Login to Oracle GoldenGate for Big Data

Login using the now-familiar DB LOGIN command as below. Please note Oracle GoldenGate for Big Data is a separate installation different from GoldenGate and you have to go into its own installation directory to get the shell access.

 DBLOGIN USERID SYSTEM PASSWORD password

6. Configure replicat process for Oracle GoldenGate for Big Data Kafka Handler

For this go to the dirprm directory of your installation direction create the below file- rkafka.prm with the following content.

REPLICAT rkafka
TARGETDB LIBFILE libggjava.so SET property=dirprm/kafka.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAP TR.CUSTOMERS, TARGET TR.CUSTOMERS;

The above configuration denotes that 10000 transactions will be grouped. It also specifies the source table information and the location of the kafka configuration.

7. Create the kafka.props File

Create the file in the same directory with the below content.

gg.handlerlist=kafkahandler
gg.handler.kafkahandler.type=kafka
gg.handler.kafkahandler.KafkaProducerConfigFile=kafka_producer.properties
gg.handler.kafkahandler.topicMappingTemplate=oggtopic
gg.handler.kafkahandler.keyMappingTemplate=${currentTimestamp}
gg.handler.kafkahandler.format=json
gg.handler.kafkahandler.SchemaTopicName=tpcSchemaTopic
gg.handler.kafkahandler.BlockingSend=false
gg.handler.kafkahandler.includeTokens=false
gg.handler.kafkahandler.mode=op
goldengate.userexit.timestamp=utc
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE
##
gg.log=log4j
gg.log.level=INFO
##
gg.report.time=30sec
##
gg.classpath=dirprm/:/home/oracle/kafka_2.11-0.9.0.0/libs/*
##
javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

The most relevant bit is the first line where it specifies the handler as ‘kafkahandler’. This is the handler provided by Oracle for pushing data to Kafka. Please note that there is one more handler related to Kafka that comes along with Oracle GoldenGate for Big Data. It is the kafka_connect handler which is an open-source handler provided by Confluent. For our exercise, we will be focussing on Oracle’s own Kafka handler. 

The value for ‘gg.handler.kafkahandler.format’ determines the format in which data will be pushed to Kafka. We have used JSON here. 

The ‘gg.handler.kafkahandler.KafkaProducerConfigFile’ key in the above file points to the configuration for the Kafka producer. This file will include the target Kafka broker URL and all the details required for pushing the data to Kafka broker.

8. Create the kafka_producer.properties File

Create this file in the same directory with contents as below.

bootstrap.servers=kafka-host:9092
acks=1
compression.type=gzip
reconnect.backoff.ms=1000
value.serializer = org.apache.kafka.common.serialization.ByteArraySerializer
key.serializer = org.apache.kafka.common.serialization.ByteArraySerializer
batch.size = 102400
linger.ms = 10000
max.request.size = 5024000
send.buffer.bytes = 5024000

The relevant bit here is the first line which should point to your Kafka broker URL. The second line denotes the number of acknowledgments the producer should require before considering the request complete. You can increase this to improve the durability of the pipeline. 

9. Add the replicat

Add the replicat as per the above configurations and start the process using the below commands

ADD REPLICATE rkafka, AIL /u01/app/ogg/dirdat/oe
START rkafka

If everything went well as per the above steps Oracle to Kafka replication should start with the above commands. By default, a Kafka topic in the name of the table will be created and changes will be pushed to that topic.

10. Test the Setup

To test the working insert a row using sqlplus.

insert into tr.customers values(42,ravi,30);

11.  View the Messages

Execute the below command in your kafka shell to view the messages.

kafka-console-consumer --zookeeper localhost:2181 --topic TD1.TR1.CUSTOMERS --from-beginning

{
  "payload": {
    "ID": 42,
    "NAME": ravi,
    "AGE": 42,
    "pos": "00000000000000002010",
    "current_ts": "2016-09-02 15:56:34.111000",
    "op_ts": "2016-09-02 14:56:26.000411",
    "op_type": "I",
    "table": "TD1.TR1.CUSTOMERS "
  },

That completes the ordeal of configuring an Oracle CDC to Kafka. As you may have already felt, this is a very tedious configuration process involving multiple details. Implementing this in production will be a challenging experience because of the following factors. 

Limitations to this Approach

  1. In most cases, data will need transformation before loading into the target database. Handling this along with this replication is even more challenging.
  2. Oracle itself maintains that configuring GoldenGate is the job of an expert Oracle administrator. So you will need an Oracle administrator along with developers to complete this execution. 

A better approach than going through this tedious configuration process manually may be to use a cloud-based ETL tool like Hevo that supports a number of source and destination combinations including GoldenGate and Kafka. Configuring the replication can be done in a few clicks using an intuitive UI, offering a very short time to production. Hevo also comes with comprehensive transformation support in flight. 

Method 2: Oracle CDC using Hevo

Hevo, a no-code data pipeline is a comfortable alternative for Oracle CDC. Hevo’s out of the box integration with these applications enable you to transfer data in minutes. Hevo provides an easy-to-use visual interface that allows you to load data in two easy steps:

Step 1: Connect Oracle to Hevo and configure as source.

Step 2: Select your destination database or data warehouse and start moving data effortlessly.

It is that simple and easy to follow!

Take a look at how you can get started with Hevo in the following video.

Benefits of Hevo

  • Easy Setup: The process involved in setting up Hevo is minimal and extremely simple to follow.
  • Completely Managed Platform: Hevo is fully managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.
  • Hassle-free Data Migration: Hevo will sense incoming data in Oracle and make corresponding modifications in Kafka alleviating the need for continuous monitoring of incoming data.
  • More Data Sources: Hevo can bring data from over 100+ data sources. This ensures that you can upgrade your data infrastructure any time, at will.
  • Data Transformation Capabilities: Hevo allows you to transform the data before and after transferring it to your destination. Data is ever ready for analysis with Hevo on your side.
  • 24×7 Customer Support: Hevo provides you with customer support round the clock over call, chat, and email.

Want to take Hevo for a spin? Get started by signing up for a 14-day free trial today!

Was this post helpful? Have you set up Oracle CDC to Kafka or do you use any other method for streaming your Oracle data? Let us know in the comments section below.

No-code Data Pipeline for Kafka