Businesses have a lot of data residing in multiple software in varying formats. This data will need to be coalesced in some way, to get useful insights and analytics on the performance of the business. Many companies use Kafka Oracle integration for the publish-subscribe messaging system and allow for the publishing and consumption of messages from a Kafka topic.
Kafka can act as a pipeline that can register all the changes happening to the data, and move them between source and destination.
In this article, we will see how to set up Kafka Oracle integration.
Table of Contents
- Introduction to Kafka
- Introduction to Oracle
- Tools and Techniques for Setting up Kafka Oracle Integration
- I. Kafka Connect Tool
- Step 1. Configure Startup Worker Parameters
- Step 2. Define the Transformations that you Need
- Step 3. Specify your Predicates
- Step 4. Specify your Error Reporting and Logging Options
- Step 5. Start the Standalone Connector or Distributed Mode Connector
- Step 6. Use REST APIs to Manage your Connectors
- Kafka Connect Features
- II. Develop your Custom Connector and Use the Connect API
- III. Using Oracle Change Data Capture(CDC)
- IV. Use HEVO for Kafka Oracle Integration
- I. Kafka Connect Tool
Hevo, A Simpler Alternative to Integrate your Data for Analysis
Hevo offers a faster way to move data from databases or SaaS applications such as Oracle and Kafka, into your data warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.Get started with hevo for free
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ sources that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Introduction to Kafka
Kafka is a stream-based, distributed message broker software that receives messages from publishers and distributes them to subscribers. Kafka stores messages in physically distributed Locations, Processes, Streams, and Responses to events.
To reduce the overhead of network round trips, Kafka groups messages together forming the “Message Set” abstraction, which leads to larger Network Packets, larger Sequential Disk operations, contiguous Memory Blocks, etc., allowing Kafka to turn a bursty stream of random message writes into linear writes. Kafka is used for Event Processing, Real-Time Monitoring, Log Aggregation, and Queuing.
In the next sections, you will understand Data Organization in Kafka and also learn about Kafka Replication in detail.
Introduction to Oracle
Oracle is a popular Relational Database Management system (RDBMS) known as Oracle database or OracleDB or simply Oracle. It is developed and managed by Oracle Corp.
OracleDB is a cross-platform application and can smoothly run on various operating systems like Windows/Unix/Mac, etc. Oracle is the first database designed purely for business and enterprise grid computing to manage information. Oracle supports Structured Query Language(SQL) to interact with the data, and the latest stable version is Oracle 12c.
As compared to other databases, Oracle is highly scalable, portable, and easily programmable. Any person who has some basic knowledge of SQL can execute queries efficiently.
Now that you have a basic understanding of both platforms, let’s dive straight into the Kafka Oracle integration part.
Tools and Techniques for Setting-up Kafka Oracle Integration
In this section, you will go through 4 different methods to set up Kafka Oracle Integration, you can finally choose one from it that feels convenient. Follow the below-mentioned methods to set up Kafka Oracle integration.
I. Kafka Connect Tool
Apache Kafka provides a tool named Kafka Connect, which enables the streaming of data between Kafka and other systems in a secure and reliable manner.
Kafka Connect can be used to enable both, incoming and outgoing connections.
In our discussion, if a system is providing/producing the data it will be called a “Source”.
The system that consumes/receives the data will be called a “Sink” because we can safely assume that the receiver system can ingest unlimited amounts of data, given its compaction or consumption strategies.
Kafka Connect can ingest data from multiple databases and application servers into Kafka topics, and supply this data for consumption by other systems down the line.
Also, an export job in Kafka Connect can deliver data from pre-existing Kafka topics into databases like Oracle for querying or batch processing.
Typically the steps for Kafka Oracle Integration to follow would be:
Step 1. Configure Startup Worker Parameters
Configure startup worker parameters like:
- bootstrap.servers – list of Kafka servers used to bootstrap connections to Kafka.
- key.converter – Specify the class that will convert from Kafka Connect format to the serialized form that is written to Kafka. This will take care of the conversion of keys in messages written to or read from Kafka.
- value.converter – This will take care of the conversion of values in messages written to or read from Kafka.
- offset.storage.file.filename – File to store the offset data for Kafka Oracle Integration.
- config.storage.topic – Whilst using the distributed mode, this parameter specifies the topic to use for storing connector and task configurations.
- offset.storage.topic – topic to use for storing offsets; this topic should have many partitions, be replicated, and be configured for compaction.
- Connector configurations – like name, connector class, tasks.max, etc.
Step 2. Define the Transformations that you Need
In most cases, you will need some transformations to be applied to either incoming data or outgoing data, as the data format stored in Kafka partitions will be different from the source/sink databases.
- transforms – List of aliases for the transformation and the order in which the transformations should be applied.
- transforms.$alias.type – Fully qualified class name for the transformation.
- transforms.$alias.$transformationSpecificConfig – Configuration properties.
- Some inbuilt transformations that you can apply are InsertField, ReplaceField, ExtractField, SetSchemaMetadata, Filter(removes messages from further processing), etc.
Step 3. Specify your Predicates
If you want to apply the above transformation selectively to only certain messages, that fulfill a certain condition, specify them in this section.
- predicates – Set of aliases for the predicates to be applied.
- predicates.$alias.type – Fully qualified class name for the predicate.
- predicates.$alias.$predicateSpecificConfig – Configuration properties for the predicate.
Some inbuilt predicates you can use are TopicNameMatches, HasHeaderKey(matches records that have a header with the given key), RecordIsTombstone, etc.
Step 4. Specify your Error Reporting and Logging Options
Kafka Connect provides error reporting and a user can decide to take an alternative action or ignore the errors altogether.
Some frequently used options here are :
- errors.log.enable = true
- errors.log.include.messages =true – also log the problem record key/value/headers to the log.
- errors.dead letter queue.context.headers.enable – to enable or disable the dead letter queue.
Step 5. Start the Standalone Connector or Distributed Mode Connector
You can run the workers in two different modes- standalone and distributed. Standalone mode is better suited for environments that lend themselves to single agents (e.g. sending logs from webservers to Kafka), or development. The distributed mode is more suitable where a single source or sink may require heavy data volumes (e.g. sending data from Kafka to HDFS), in terms of scalability and high availability to minimize downtime.
Step 6. Use REST APIs to Manage your Connectors
You can define your listeners and actions. The REST API is used to monitor/manage Kafka Connect, as well as for the Kafka Connect cross-cluster communication.
Kafka Connect Features
Kafka Connect has many useful features like:-
- Standalone as well as the Distributed mode
- REST Apis in case you want to manage multiple connectors in your cluster
- Automatic offset management – Kafka Connect can manage the offset commit process automatically so connector developers do not need to worry about this error-prone part of connector development.
- Scalability and Bridging between stream-based system and batch processing systems
The downside of Kafka Connect is the need to configure many parameters and make sure that they do not conflict with each other or impede performance. Also, you must know how Kafka works and its background intricacies.
II. Develop your Custom Connector and Use the Connect API
You can also establish Kafka Oracle integration via Connect API. If you want your connector to do much more than the connector functions and transformation provided by default, you can develop your own custom Connector too.
Kafka provides a low-level Connector-API to develop a custom connector.
A full discussion and code on how to achieve this are out of scope for this discussion, but this can be a viable option for experienced developers.
You can see the Connect API documentation here.
III. Using Oracle Change Data Capture(CDC)
Oracle CDC also lets you achieve Kafka Oracle integration. Oracle Change Data Capture is a service that works in a fashion that is somewhat similar to Kafka.
CDC defines 2 abstractions of Publishers and Subscribers. It captures changes to the database tables and user actions, then makes this data available to applications or individuals (Subscribers).
CDC captures change data in 2 ways:-
1. Synchronous – Triggers in the database allow immediate capturing of changed data, as soon as a DML command is executed.
2. Asynchronous – If there are no triggers, CDC reads the data sent to the redo log, as soon as the SQL statement containing the DML operation is committed.
A) Using GoldenGate
The downside here is that these tools are priced a bit heavily (more than 17K per CPU).
B) Using Kafka’s JDBC Connector
Kafka’s JDBC connector allows you to connect with many RDBMS like Oracle, SQL Server, MySQL, and DB2, etc.
This option requires a Kafka Connect runtime.
The first step is to configure the JDBC connector, specifying parameters like
- the connection details.
- topic.prefix – prefix to prepend to table names.
- mode – bulk, incrementing, timestamp, timestamp+incrementing, etc.
Use incrementing mode if you need to capture only new rows and are not interested in changes to older rows. Use timestamp mode for capturing a time-based detection of new and modified rows.
This option will require programming resources and any transformation needed will need to be programmed. Some open-source tools can help a bit, but you will need to have insights into backend programming and JDBC functioning.
IV. Use HEVO for Kafka Oracle Integration
If you just want to achieve Kafka Oracle integration with just a few clicks, and delegate intrinsic details to a tried and tested tool, use Hevo. Establishing Kafka Oracle integration has never been this easy.visit our website to explore hevo
Hevo is a No-code data pipeline. It has pre-built integrations with 100+ sources. You can connect your SaaS platforms, databases, etc. to any data warehouse of your choice, without writing any code or worrying about maintenance.
If you are interested, you can try Hevo by sign up for the 14-day free trial.