Oracle to MongoDB Replication: 9 Easy Steps

on Data Replication, Database, MongoDB, NoSQL, Oracle • May 18th, 2022 • Write for Hevo

Oracle to MongoDB Replication | Hevo Data

The data replication process is needed to make a consistent copy of the data available across all the database nodes. Oracle databases can integrate with the MongoDB database that can accept and stream data from the Oracle database. With the Oracle database, you can detect the changes to the databases using CDC (Change Data Capture). CDC refers to the process that identifies and tracks changes to data in a database. CDC can detect changes in operations in any databases like delete, update and delete events, which are sent to the other databases during ETL (Extract, Transform and Load), database migrations, and replications.

The data replication process is needed to make a consistent copy of the data available across all the database nodes. And the Oracle database allows you to simplify the process of replication changes of the MongoDB database with a few lines of code.

In this article, you will learn about the Oracle to MongoDB replication process.

Table of Contents

Prerequisites

Fundamental understanding of the integration between databases

What is Oracle?

Oracle Logo, Oracle to MongoDB Replication | Hevo data
Image Source

Developed in 1979, Oracle is a relational database management system (RDBMS) from Oracle corporation. Oracle is the best choice for cost-effective applications and data management solutions in organizations. To interact with the databases, Oracle supports SQL as a query language. Oracle is also compatible with operating systems like Windows, Unix, Linus, and macOS. 

Features of Oracle

1) Availability

Oracle’s Oracle Data Guard capability ensures excellent database availability. This functionality enables the use of the secondary database as a copy of the primary database during any failure. As a result, all typical operations, such as backups and partial failures, do not restrict the database from being utilized.

2) Security

Oracle consists of a mechanism for controlling and accessing the database to prevent unauthorized access. It provides high security due to its Oracle Advanced Security features. It comprises two solutions that protect the Oracle databases – Transparent Data Encryption (TDE) and Data Redaction.

3) Scalability

Oracle database consists of features like Real Application Cluster (RAC) and Portability that make Oracle Database scalable based on usage. Oracle includes capabilities like rolling instance migrations, maintaining application continuity, quality of service management, performing upgrades, and more in a clustered environment.

4) Performance

Oracle provides high performance due to its optimization tools like Oracle Advanced Compression, Oracle Database in Memory, Oracle Real Application Testing, and Oracle TimesTen Application-Tier Database Cache.

Simplify MongoDB ETL Using Hevo’s No-Code Data Pipeline

Hevo Data, a Fully-managed Data Pipeline platform, can help you automate, simplify & enrich your data replication process in a few clicks. With Hevo’s wide variety of connectors and blazing-fast Data Pipelines, you can extract & load data from 100+ Data Sources such as Oracle & MongoDB to the Data Warehouses of your choice. To further streamline and prepare your data for analysis, you can process and enrich raw granular data using Hevo’s robust & built-in Transformation Layer without writing a single line of code!

GET STARTED WITH HEVO FOR FREE

Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial today to experience an entirely automated hassle-free Data Replication!

What is MongoDB?

MongoDb Logo, Oracle to MongoDB Replication | Hevo Data
Image Source

Developed in 2009, MongoDB is an open-source document-oriented database designed to store large data volumes and allow users to work with data efficiently. MongoDB is classified as NoSQL since its data is not kept in tables.

MongoDB provides official driver support for all the popular languages like C, C++, Net, Go, Java, Node.js, Perl, PHP, Python, Motor, Ruby, and more. Therefore, you can create an application using these languages.

Records in MongoDB are called documents, consisting of data in the pairs of key and value. Documents are fundamental units in MongoDB. These documents are similar to JavaScript Object Notation (JSON) but use a variant called Binary JSON (BSON). Documents consist of fields identical to the columns in a relational database. A set of documents is called collection in MongoDB, which can have any data, but the data cannot be spread across different databases. 

Features of MongoDB

1) Schema-Less Database

Schema-less Database means that a collection can hold different types of documents in it. In other words, a single collection can hold multiple documents, and such documents might consist of different numbers of fields, content, and size. Since every document is not similar to another document in the MongoDB database, MongoDB provides scalability to the databases.

2) Replication

MongoDB provides high availability and redundancy due to its replication. It creates multiple copies of the data and sends such data to the different servers. Therefore, if any server fails, data is retrieved from the other servers.

3) Document-Oriented

All databases in MongoDB are stored in documents instead of tables like RDBMS. Documents hold data in pairs of keys and values, making it more versatile than RDBMS.

4) Scalability

MongoDB provides horizontal scalability in databases using Sharding. Sharding distributes data on multiple servers, where large volumes of data are partitioned into chunks using the shard key. Such data chunks are then distributed across shards that evenly reside on many physical servers.

Oracle to MongoDB Replication

Using the Apache Kafka, the Confluent Oracle CDC Connector, and the MongoDB connector for Apache Kafka, users can easily stream database changes from Oracle to MongoDB.

Image Source

For connecting Oracle to MongoDB, you need a self-contained docker-compose environment that consists of the below requirements

  1. Oracle Database
  2. MongoDB
  3. Apache Kafka
  4. Confluent KSQL

1) Oracle to MongoDB Replication: Prepare the Oracle Docker Image

You can remove the section “database” from the docker-compose file if you have an Oracle database. You can also pull the Oracle Database Enterprise Edition from the Docker hub. Accept the Oracle terms and conditions and log in to your docker account with docker login. Use pull

store/oracle/database-enterprise:12.2.0.1-slim

 to download the docker image locally.

2) Oracle to MongoDB Replication: Launch the Docker environment

The docker-compose file will launch the below software

  1. Apache Kafka, including ZooKeeper, REST API, KSQL, Schema Registry
  2. Apache Kafka Connect
  3. MongoDB Connector for Apache Kafka
  4. Confluent Oracle CDC Connector
  5. Oracle Database Enterprise

Ensure that you have your Oracle environment ready to launch the environment and then clone the Github repository. Build the docker-compose file.

docker-compose up -d --build

After that, you need to configure your Oracle environment to be used by the Confluent CDC Connector.

3) Oracle to MongoDB Replication: Configuring Oracle instance

If you are using Oracle within the docker environment, use the docker exec command, as shown below.

docker exec -it oracle bash -c "source /home/oracle/.bashrc; sqlplus /nolog"
connect / as sysdba

4) Oracle to MongoDB Replication: Configuring Oracle for CDC Connector

Check if the database is in archive log mode.

select log_mode from v$database;

If the database is not in archive log mode, use the below command.

SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;

Verify the archive mode.

select log_mode from v$database

Enable supplemental logging for all columns. Supplemental logging in Oracle refers to all columns or selected columns specified for extra logging. It is needed for generating changelogs for the update and delete statements. 

ALTER SESSION SET CONTAINER=cdb$root;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

The below output should appear on the Oracle CDB.

MongoDB Output, Oracle to MongoDB Replication | Hevo Data
Image Source

Create some objects in the database using the below commands.

CREATE TABLE C##MYUSER.emp
(
   i INTEGER GENERATED BY DEFAULT AS IDENTITY,
   name VARCHAR2(100),
   lastname VARCHAR2(100),
   PRIMARY KEY (i)
) tablespace sysaux;
  
insert into C##MYUSER.emp (name, lastname) values ('Bob', 'Perez');
insert into C##MYUSER.emp (name, lastname) values ('Jane','Revuelta');
insert into C##MYUSER.emp (name, lastname) values ('Mary','Kristmas');
insert into C##MYUSER.emp (name, lastname) values ('Alice','Cambio');
Commit;

5) Oracle to MongoDB Replication: Creating a Kafka Topic

Open a new terminal for connecting to your Kafka Server.

docker exec -it broker /bin/bash

After connecting to Kafka, create the Kafka topic.

kafka-topics --create --topic SimpleOracleCDC-ORCLCDB-redo-log 
--bootstrap-server broker:9092 --replication-factor 1 
--partitions 1 --config cleanup.policy=delete 
--config retention.ms=120960000

What makes Hevo’s ETL Process Best-In-Class

Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo’s automated, No-code platform empowers you with everything you need to have for a smooth data replication experience.

Check out what makes Hevo amazing:

  • Fully Managed: Hevo requires no management and maintenance as it is a fully automated platform.
  • Data Transformation: Hevo provides a simple interface to perfect, modify, and enrich the data you want to transfer.
  • Faster Insight Generation: Hevo offers near real-time data replication so you have access to real-time insight generation and faster decision making. 
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (with 40+ free sources) that can help you scale your data infrastructure as required.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-day free trial!

6) Oracle to MongoDB Replication: Configuring the Oracle CDC Connector

The oracle-cdc-source.json file in the Github repository contains all information about the configuration of Confluent Oracle CDC Connector. Use the below command to execute the CDC Connector. 

curl -X POST -H "Content-Type: application/json" -d @oracle-cdc-source.json  http://localhost:8083/connectors

7) Oracle to MongoDB Replication: Setting up KSQL Data Flows in Kafka

Since the Oracle CRUD events arrive in the Kafka topic, use KSQL to stream these events into a new topic such that the MongoDB connector can consume them. 

docker exec -it ksql-server bin/bash
ksql http://127.0.0.1:8088

Enter the below commands to create streams in KSQL. KSQL is a Java library build on top of Kafka Streams to enrich, transform and process real-time streams of data.

CREATE STREAM CDCORACLE (I DECIMAL(20,0), NAME varchar, LASTNAME varchar, op_type VARCHAR) WITH ( kafka_topic='ORCLCDB-EMP', PARTITIONS=1, REPLICAS=1, value_format='AVRO');

CREATE STREAM WRITEOP AS
  SELECT CAST(I AS BIGINT) as "_id",  NAME ,  LASTNAME , OP_TYPE  from CDCORACLE WHERE OP_TYPE!='D' EMIT CHANGES;

CREATE STREAM DELETEOP AS
  SELECT CAST(I AS BIGINT) as "_id",  NAME ,  LASTNAME , OP_TYPE  from CDCORACLE WHERE OP_TYPE='D' EMIT CHANGES;

Verify the Streams using the below command.

SHOW STREAMS;

Output:

Output, Oracle to MongoDB Replication | Hevo Data
Image Source

8) Oracle to MongoDB Replication: Configuring MongoDB Kafka sink connector

The MongoDB Kafka sink connector is a connector that reads data from Apache Kafka and writes to MongoDB. 

The below code consists of the configuration of the MongoDB Connector for the Apache Kafka.

{
  "name": "Oracle",
  "config": {
    "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
    "topics": "WRITEOP",
    "connection.uri": "mongodb://mongo1",
    "writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneBusinessKeyTimestampStrategy",
    "database": "kafka",
    "collection": "oracle",
    "document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
    "document.id.strategy.overwrite.existing": "true",
    "document.id.strategy.partial.value.projection.type": "allowlist",
    "document.id.strategy.partial.value.projection.list": "_id",
    "errors.log.include.messages": true,
    "errors.deadletterqueue.context.headers.enable": true,
    "value.converter":"io.confluent.connect.avro.AvroConverter",
    "value.converter.schema.registry.url":"http://schema-registry:8081",
    "key.converter":"org.apache.kafka.connect.storage.StringConverter"

  }
}

The sink process consumes records from the WRITEOP topic and saves the data to MongoDB. UpdateOneBusinessKeyTimestampStrategy is the writemodel strategy that performs the upsert operation using the filter defined on PartialValueStrategy property. Upsert is used for insert and update operations in MongoDB, which is a combination of update and insert. This configuration script is written in mongodb-sink.json file in the Github repository. To configure the configuration, use the below command.

curl -X POST -H "Content-Type: application/json" -d @mongodb-sink.json  http://localhost:8083/connectors

The delete events are written in the DELETEOP topic and are sent to MongoDB using the below sink configuration.

{
  "name": "Oracle-Delete",
  "config": {
    "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
    "topics": "DELETEOP",
    "connection.uri": "mongodb://mongo1”,
    "writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.DeleteOneBusinessKeyStrategy",
    "database": "kafka",
    "collection": "oracle",
    "document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
    "document.id.strategy.overwrite.existing": "true",
    "document.id.strategy.partial.value.projection.type": "allowlist",
    "document.id.strategy.partial.value.projection.list": "_id",
    "errors.log.include.messages": true,
    "errors.deadletterqueue.context.headers.enable": true,
    "value.converter":"io.confluent.connect.avro.AvroConverter",
    "value.converter.schema.registry.url":"http://schema-registry:8081"

  }
}

curl -X POST -H "Content-Type: application/json" -d @mongodb-sink-delete.json  http://localhost:8083/connectors

Such sink process uses the DeleteOneBusinessKeyStrategy writemodel strategy. The write model strategy is a class that defines how your sink connector should write data using write models. In the above configuration, the sink process reads from the DELETEOP topic and deletes documents in MongoDB based on the filter defined on PartialValueStrategy property.

9) Oracle to MongoDB Replication: Writing Data to Oracle

After your environment is set up and configured, go to the Oracle database and insert the below data.

insert into C##MYUSER.emp (name, lastname) values ('Juan','Soto');
insert into C##MYUSER.emp (name, lastname) values ('Robert','Walters');
insert into C##MYUSER.emp (name, lastname) values ('Ruben','Trigo');
commit;

Notice the data arrived in MongoDB by accessing the MongoDB shell.

docker exec -it mongo1 /bin/mongo

The inserted data is now available in MongoDB, as shown below.

Output, Oracle to MongoDB Replication | Hevo Data
Image Source

If you update any record in Oracle, you will see the updated record in the MongoDB document.

For example,

UPDATE C##MYUSER.emp SET name=’Rob’ WHERE name=’Robert’;
COMMIT;

Then, the document is updated in MongoDB, as below.

{
        "_id" : NumberLong(11),
        "LASTNAME" : "Walters",
        "NAME" : "Rob",
        "OP_TYPE" : "U",
        "_insertedTS" : ISODate("2021-07-27T10:25:08.867Z"),
        "_modifiedTS" : ISODate("2021-07-27T10:25:08.867Z")
}

If you delete the data in Oracle, as below:

DELETE FROM C##MYUSER.emp WHERE name=’Rob’; COMMIT;.

The documents having the name ‘Rob’ are deleted in MongoDB.

Conclusion

In this article, you learned to connect Oracle to MongoDB replication. This article focused on streaming databases from Oracle to MongoDB using Apache Kafka, the Confluent Oracle CDC Connector, and the MongoDB Connector for Apache Kafka. Generally, organizations dealing with massive data need a reliable, robust, and secure data management system. Therefore, Oracle provides organizations with a versatile data management system that can manage data, prevent any security threats, and provide seamless access to applications.

visit our website to explore hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations, with a few clicks. Hevo Data with its strong integration with 100+ sources (including 40+ free sources) such as Oracle & MongoDB allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

No-code Data Pipeline For MongoDB & Oracle