Businesses have a lot of data residing in multiple software in varying formats. This data will need to be coalesced in some way, to get useful insights and analytics on the performance of the business. Many companies use Kafka Oracle integration using Oracle CDC to Kafka for the publish-subscribe messaging system and allow for the publishing and consumption of messages from a Kafka topic.
Table of Contents
Kafka can act as a pipeline that can register all the changes happening to the data, and move them between your sources like the Oracle database and destination.
In this article, we will see how to set up Kafka Oracle integration.
What is Kafka?
Kafka is a stream-based, distributed message broker software that receives messages from publishers and distributes them to subscribers. Kafka stores messages in physically distributed Locations, Processes, Streams, and Responses to events.
To reduce the overhead of network round trips, Kafka groups messages together forming the “Message Set” abstraction, which leads to larger Network Packets, larger Sequential Disk operations, contiguous Memory Blocks, etc., allowing Kafka to turn a bursty stream of random message writes into linear writes. Kafka is used for Event Processing, Real-Time Monitoring, Log Aggregation, and Queuing.
In the next sections, you will understand Data Organization in Kafka and also learn about Kafka Replication in detail.
What is Oracle?
Oracle is a popular Relational Database Management system (RDBMS) known as Oracle database or OracleDB or simply Oracle. It is developed and managed by Oracle Corp.
OracleDB is a cross-platform application and can smoothly run on various operating systems like Windows/Unix/Mac, etc. Oracle is the first database designed purely for business and enterprise grid computing to manage information. Oracle supports Structured Query Language(SQL) to interact with the data, and the latest stable version is Oracle 12c.
As compared to other databases, Oracle is highly scalable, portable, and easily programmable. Any person who has some basic knowledge of SQL can execute queries efficiently.
Now that you have a basic understanding of both platforms, let’s dive straight into the Kafka Oracle integration part.
Hevo Data, a Fully-managed Data Pipeline platform, can help you automate, simplify & enrich your data replication process in a few clicks. With Hevo’s wide variety of connectors and blazing-fast Data Pipelines, you can extract & load data from 100+ Data Sources straight into your Data Warehouse or any Databases. To further streamline and prepare your data for analysis, you can process and enrich raw granular data using Hevo’s robust & built-in Transformation Layer without writing a single line of code!GET STARTED WITH HEVO FOR FREE
Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial today to experience an entirely automated hassle-free Data Replication!
Hevo offers a faster way to move data from databases or SaaS applications such as Oracle and Kafka, into your data warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.Get started with hevo for free
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ sources that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Tools and Techniques for Setting Up Kafka Oracle Integration
In this section, you will go through 4 different methods to set up Kafka Oracle Integration, you can finally choose one from it that feels convenient. Follow the below-mentioned methods to set up Kafka Oracle integration.
I. Kafka Connect Tool
Apache Kafka provides a tool named Kafka Connect, which enables the streaming of data between Kafka and other systems in a secure and reliable manner.
Kafka Connect can be used to enable both, incoming and outgoing connections.
In our discussion, if a system is providing/producing the data it will be called a “Source”.
The system that consumes/receives the data will be called a “Sink” because we can safely assume that the receiver system can ingest unlimited amounts of data, given its compaction or consumption strategies.
Kafka Connect can ingest data from multiple databases and application servers into Kafka topics, and supply this data for consumption by other systems down the line.
Also, an export job in Kafka Connect can deliver data from pre-existing Kafka topics into databases like Oracle for querying or batch processing.
Typically the steps for Kafka Oracle Integration to follow would be:
Step 1. Configure Startup Worker Parameters
Configure startup worker parameters like:
- bootstrap.servers – list of Kafka servers used to bootstrap connections to Kafka.
- key.converter – Specify the class that will convert from Kafka Connect format to the serialized form that is written to Kafka. This will take care of the conversion of keys in messages written to or read from Kafka.
- value.converter – This will take care of the conversion of values in messages written to or read from Kafka.
- offset.storage.file.filename – File to store the offset data for Kafka Oracle Integration.
- config.storage.topic – Whilst using the distributed mode, this parameter specifies the topic to use for storing connector and task configurations.
- offset.storage.topic – topic to use for storing offsets; this topic should have many partitions, be replicated, and be configured for compaction.
- Connector configurations – like name, connector class, tasks.max, etc.
Step 2. Define the Transformations that you Need
In most cases, you will need some transformations to be applied to either incoming data or outgoing data, as the data format stored in Kafka partitions will be different from the source/sink databases.
- transforms – List of aliases for the transformation and the order in which the transformations should be applied.
- transforms.$alias.type – Fully qualified class name for the transformation.
- transforms.$alias.$transformationSpecificConfig – Configuration properties.
- Some inbuilt transformations that you can apply are InsertField, ReplaceField, ExtractField, SetSchemaMetadata, Filter(removes messages from further processing), etc.
Step 3. Specify your Predicates
If you want to apply the above transformation selectively to only certain messages, that fulfill a certain condition, specify them in this section.
- predicates – Set of aliases for the predicates to be applied.
- predicates.$alias.type – Fully qualified class name for the predicate.
- predicates.$alias.$predicateSpecificConfig – Configuration properties for the predicate.
Some inbuilt predicates you can use are TopicNameMatches, HasHeaderKey(matches records that have a header with the given key), RecordIsTombstone, etc.
Step 4. Specify your Error Reporting and Logging Options
Kafka Connect provides error reporting and a user can decide to take an alternative action or ignore the errors altogether.
Some frequently used options here are :
- errors.log.enable = true
- errors.log.include.messages =true – also log the problem record key/value/headers to the log.
- errors.dead letter queue.context.headers.enable – to enable or disable the dead letter queue.
Step 5. Start the Standalone Connector or Distributed Mode Connector
You can run the workers in two different modes- standalone and distributed. Standalone mode is better suited for environments that lend themselves to single agents (e.g. sending logs from webservers to Kafka), or development. The distributed model is more suitable where a single source or sink may require heavy data volumes (e.g. sending data from Kafka to HDFS), in terms of scalability and high availability to minimize downtime.
Step 6. Use REST APIs to Manage your Connectors
You can define your listeners and actions. The REST API is used to monitor/manage Kafka Connect, as well as for the Kafka Connect cross-cluster communication.
Kafka Connect Features
Kafka Connect has many useful features like:-
- Standalone as well as the Distributed mode
- REST Apis in case you want to manage multiple connectors in your cluster
- Automatic offset management – Kafka Connect can manage the offset commit process automatically so connector developers do not need to worry about this error-prone part of connector development
- Scalability and Bridging between stream-based systems and batch processing systems
- Oracle CDC to Kafka for capturing changes to rows in your Oracle database and reflecting those changes from Oracle CDC to Kafka topics.
The downside of Kafka Connect is the need to configure many parameters and make sure that they do not conflict with each other or impede performance. Also, you must know how Kafka works and its background intricacies.
Develop your Custom Connector and Use the Connect API
You can also establish Kafka Oracle integration via Connect API. If you want your connector to do much more than the connector functions and transformation provided by default, you can develop your own custom Connector too.
Kafka provides a low-level Connector-API to develop a custom connector.
A full discussion and code on how to achieve this are out of the scope for this discussion, but this can be a viable option for experienced developers.
You can see the Connect API documentation here.
Using Oracle CDC to Kafka
Oracle CDC to Kafka also lets you achieve Kafka Oracle integration. Oracle CDC to Kafka is a service that works in a fashion that is somewhat similar to Kafka.
Oracle CDC to Kafka defines 2 abstractions of Publishers and Subscribers. It captures changes to the database tables and user actions and then makes this data available to applications or individuals (Subscribers).
Oracle CDC to Kafka captures change data in 2 ways:-
1. Synchronous – Synchronous capturing in Oracle CDC to Kafka triggers the database to allow immediate capturing of changed data, as soon as a DML command is executed.
2. Asynchronous – Asynchronous capturing in Oracle CDC to Kafka operates if there are no triggers. This Oracle CDC to Kafka mode reads the data sent to the redo log, as soon as the SQL statement containing the DML operation is committed.
A) Using GoldenGate
The easier option in Oracle CDC to Kafka is to use the Oracle proprietary Golden Gate tool ( or Attunity Replicate, Dbvisit Replicate, or Striim) for Kafka Oracle integration.
The downside here is that these tools are priced a bit heavily (more than 17K per CPU).
B) Using Kafka’s JDBC Connector
The second option for Oracle CDC to Kafka is by using Kafka’s JDBC connector which allows you to connect with many RDBMS like Oracle, SQL Server, MySQL, DB2, etc.
This option requires a Kafka Connect runtime.
The first step is to configure the JDBC connector, specifying parameters like
- the connection details.
- topic.prefix – prefix to prepend to table names.
- mode – bulk, incrementing, timestamp, timestamp+incrementing, etc.
Use incrementing mode in this Oracle CDC to Kafka method if you need to capture only new rows and are not interested in changes to older rows. Use timestamp mode for capturing a time-based detection of new and modified rows.
This option will require programming resources and any transformation needed will need to be programmed. Some open-source tools can help a bit, but you will need to have insights into backend programming and JDBC functioning.
Hevo Data provides its users with a simpler platform for integrating data from 100+ sources such as Kafka and Oracle Database into a Data Warehouse for analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouse, Database, or a destination of your choice.Visit our Website to Explore Hevo
Using Hevo, you can connect your SaaS platforms, databases, etc. to any data warehouse of your choice, without writing any code or worrying about maintenance.
If you are interested, you can try Hevo by Sign Up for the 14-day free trial. Visit our unbeatable pricing page for more information on Hevo’s affordable pricing plans.
Share with us your experience of learning about Kafka Oracle Integration in the comment box below. Have any questions on Kafka Oracle Integration? Let us know.