Almost all companies today, irrespective of their market position or size, leverage the data collected to analyze their business & customers, making smarter and informed business decisions that help drive business growth and profitability. Oracle CDC (Change Data Capture) is a technology that identifies and captures data added to, updated, and deleted from Oracle tables, making it available for applications to consume.

Hence, businesses are constantly looking for ways to manage their data better. One of the most widely used practices is Change Data Capture. This article will help you understand Change Data Capture (CDC) and how to set up Oracle CDC (Change Data Capture).

Oracle CDC using Transaction Logs (Log-based CDC)

Log-based CDC is the best Oracle CDC method. This involves minimal load on the Oracle source system and can replicate all the changes accurately. The Oracle CDC method uses transaction logs, mines the Oracle redo-logs, and the archive logs which helps Oracle recover on its own during a failure. After the logs are mined, the transactions can be applied on the destination similar to the source. This provides a real-time view of the data on the destination. All these data can be accessed on the destination or the target. Oracle CDC with transactional logs works on all Oracle versions including 12c and 19c.

What makes Hevo’s ETL Process Best-in-Class

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Get Started with Hevo for Free

The Shift from Trigger-Based to Log-Based Oracle CDC

Oracle 9i initially provided users of its database with a built-in mechanism for tracking and storing changes, just like native CDC. The major drawback of this initial version of Oracle CDC was that it depended on the placement of triggers on tables in the source database. So, DBAs were reluctant to use this technology. Oracle later released Oracle 10g with database redo logs and a built-in Oracle replication tool for capturing and transmitting data changes without invasive database triggers.

The new form of Oracle CDC was a log-based form of CDC that was lightweight and did not require changing the structure of source tables. Though it has been a popular database feature, Oracle no longer supports CDC with the release of Oracle 12c. This encourages users to switch to the Oracle replication software solution called Oracle GoldenGate.

Change Sources and Modes of Change Data Capture

Change Data Capture comes with synchronous and asynchronous modes for capturing change data. In this section, you will understand how each mode of Change Data Capture is performed with Oracle Change Data Capture Example.

What is Synchronous Oracle CDC (Change Data Capture)

A Synchronous Oracle CDC (Change Data Capture) is implemented using triggers that inserts entries to a Change Capture table every time a data modification operation is executed. Triggers are hooks that get executed as part of the transactions. The first step is creating a user to serve as the Change Data Publisher. This user must have all access to the namespace and the table from which change data is to be captured.

Assuming this user is created, you will now go to the core part of this process, i.e., creating a Change Set, Change Table, and subscribing to changes.

Synchronous Change Data Capture Configuration
fSynchronous Oracle CDC Configuration (Image via Oracle)

Steps to Set up Synchronous Oracle CDC (Change Data Capture)

Users can set up Synchronous Oracle CDC (Change Data Capture) by implementing the following steps:

Step 1: Creating a Change Set

To create a change set, users can execute the following command in Oracle:

BEGIN
    DBMS_CDC_PUBLISH.CREATE_CHANGE_SET( 
    change_set_name    => 'PRODUCT_SET', 
    description        => 'Change set for product change info', 
    change_source_name => 'SYNC_SOURCE');
END;

In the above query, SYNC_SOURCE is a predefined source defined by Oracle to implement Synchronous Oracle CDC (Change Data Capture).

Step 2: Creating a Change Table

Using the Change Set defined previously, create a Change Table that will capture information from the Triggers by executing the following query:

BEGIN
   DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE(

  owner             => 'cdcuser',
   change_table_name => 'products_ct', 
   change_set_name   => 'PRODUCT_SET',
   source_schema     => 'SH',
   source_table      => 'PRODUCTS',
   column_type_list  => 'ID NUMBER(6), 
                         NAME VARCHAR2(50),
                         PRICE NUMBER(8,2)', 
   capture_values    => 'both',
   rs_id             => 'y', 
   row_id            => 'n',
   user_id           => 'n',
   timestamp         => 'n',
   object_id         => 'n',
   source_colmap     => 'y',
   target_colmap     => 'y',
   options_string    => 'TABLESPACE TS_PRODUCT');
END;

In the above command, column_type_list attribute captures the list of columns for which changes will be captured.

Step 3: Creating a Subscription to Capture Data

The next step is to create a Subscription to capture the data that is being inserted to the Change Table. This can be done by executing the following query:

BEGIN
       DBMS_CDC_SUBSCRIBE.CREATE_SUBSCRIPTION(
       change_set_name   => 'PRODUCT_SET',
       description       => 'Change data for PRODUCTS',
       subscription_name => 'PRODUCT_SUB');
END;

Step 4: Subscribing to Changes on Source Table

Subscribe to the source table changes using the subscription object created in the previous step by executing the following query:

BEGIN
       DBMS_CDC_SUBSCRIBE.SUBSCRIBE(
       subscription_name => 'PRODUCT_SUB',
       source_schema     => 'SH',
       source_table      => 'PRODUCTS',
       column_list       => 'ID, NAME,PRICE',
       subscriber_view   => 'PRODUCT_VIEW');
END;

Step 5: Activating the Subscription

You can run the following query to activate the subscription:

BEGIN
   DBMS_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION(
       subscription_name => 'PRODUCT_SUB');
END;

Step 6: Loading Data from Subscribed Table

Loading data from the subscribed table is done based on a collection window for the next available set of change data. Every time the following command is called, new change data that was created since the last call of this command is added to the subscription window.

BEGIN
   DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW(
       subscription_name => 'PRODUCT_SUB');
END;

Step 7: Using Select Query to Load Changes

You can now use a select query to get the new changes and process them as per the target database.

SELECT ID, NAME, PRICE FROM SALES_VIEW;

To copy data to another destination database, the above action will have to be performed using a script that can process these records and insert to the target database.

The biggest disadvantage of the above approach is that Triggers affect the performance of the source database. For cases where this cannot be afforded, Oracle provides another alternative to implement Oracle CDC (Change Data Capture). 

Understanding Asynchronous Oracle CDC (Change Data Capture)

Asynchronous Oracle CDC (Change Data Capture) is implemented using Redo Logs. Redo Logs are logs that keep a track of all activities in a database. As the name suggests, this is an asynchronous process and does not cause any performance hit.

Asynchronous HotLog Configuration
Asynchronous HotLog Configuration (Image via Oracle )

Steps to Set up Asynchronous Oracle CDC (Change Data Capture)

Users can set up Asynchronous Oracle CDC (Change Data Capture) by implementing the following steps:

Step 1: Configuring Database

The first step is to enable FORCE LOGGING mode on your database. You can do so by executing the following query:

ALTER DATABASE FORCE LOGGING;

Step 2: Enabling Supplemental Logging

Supplemental Logging allows the database to create entries in the log for all changes to the table including UPDATE statements. It can be enabled by executing the following query:

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

Step 3: Enabling Logging for Columns

First, you would need to identify the columns that you need to capture changes from. If you choose to capture changes from all columns, you can run the following query:

ALTER TABLE ts.products ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Step 4: Preparing Source Tables

You can prepare source tables for Oracle CDC (Change Data Capture) by executing the following query:

BEGIN DBMS_CAPTURE_ADM.PREPARE_TABLE_INSTANTIATION(TABLE_NAME => 'ts.products'); END;

Step 5: Creating Change Set

You can create a Change Set with the source as HOTLOG_SOURCE by executing the following query:

BEGIN
       DBMS_CDC_PUBLISH.CREATE_CHANGE_SET(
       change_set_name => 'PRODUCT_SET',
       description => 'Change set for product info',
       change_source_name => 'HOTLOG_SOURCE',
       stop_on_ddl => 'y',
       begin_date => sysdate,
       end_date => sysdate+5);
END;

HOTLOG_SOURCE is a predefined source defined by Oracle for Asynchronous Oracle CDC (Change Data Capture).

Step 6: Creating Change Table

You can create a Change Table by executing the following query:

BEGIN
   DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE(
   owner              => 'cdcpub',
   change_table_name  => 'products_ct', 
   change_set_name    => 'PRODUCT_SET',
   source_schema      => 'TS',
   source_table       => 'PRODUCTS',
   column_type_list   => 'ID NUMBER(6), NAME VARCHAR2(50),
      PRICE NUMBER(8,2)',
   capture_values     => 'both',
   rs_id              => 'y',
   row_id             => 'n',
   user_id            => 'n',
   timestamp          => 'n',
   object_id          => 'n',
   source_colmap      => 'n',
   target_colmap      => 'y',
   options_string     => 'TABLESPACE TS_PRODUCT_SET');
END;

Step 7: Enabling Change Set

Finally, you have to enable the Change Set for Oracle CDC (Change Data Capture) by executing the following query:

BEGIN
   DBMS_CDC_PUBLISH.ALTER_CHANGE_SET(
      change_set_name => 'PRODUCT_SET',
      enable_capture => 'y');
END;

That’s it, you now have set up Oracle CDC (Change Data Capture) for your tables which means that all the changed data will be captured in the changing table. However, you still need to implement a mechanism that creates a subscription to this table and process accordingly.

Oracle Replication Types with CDC

Oracle CDC using Timestamps

Audit Timestamps can be used to replicate changed records because they record the Date and Time when a record was entered or updated.

Timestamp Disadvantages:

  • Since deleted records cannot be detected, this technique is unsuitable for most applications.
  • This method has a high overhead because queries must be conducted continuously on the source Oracle Database and tables.
  • The Source Application may not reliably capture Audit Dates and Timestamp Columns, resulting in improper replication. For the CDC, there are concerns with Timestamps and Triggers.

Using Triggers

On the tables that will be replicated, Triggers can be defined. These are triggered by every insert, update, and deletion, and the Timestamps can be gathered into another change table. The Audit Table can then be used for replication.

Disadvantages:

Triggers also increase the overheads of the Oracle Database by putting a burden on it.

Transaction Logs in Oracle CDC 

The best Oracle CDC approach is log-based CDC, which places a minimal burden on the Oracle source system and accurately replicates all changes. Oracle’s CDC technique, which uses Transaction Logs, mines Oracle redo-logs and/or archive logs for its recovery in the event of a failure. When the logs are mined, transactions can be applied to the destination like they were applied to the source, giving the destination a real-time view of the data. This data can be retrieved on the goal or destination, typically a Data Lake, Data Warehouse, or Analytical Appliance. On ADLS Gen2, create an Azure Data Lake.

Data may then be combined across numerous Enterprise Systems and Applications to produce rich insights that can be used to uncover significant business opportunities, improve operational efficiencies, spot key business trends, and gain a competitive edge. Since, it only requires access to the Oracle redo and archive logs, Oracle CDC with transactional logs works on all Oracle Versions, including 12c and 19c. The format of these, however, can vary significantly between Oracle Versions.

Oracle CDC 12c: Oracle Streams Deprecation

Oracle Streams was a native CDC utility for Oracle Databases that was free and could be used for real-time data transport. Oracle Streams can push transactional updates automatically to a queue to be replicated. Oracle Streams is no longer supported as of Oracle 12c. Oracle encourages consumers to upgrade to Oracle GoldenGate, a premium version with substantial licensing fees.

Oracle CDC 19c: Oracle LogMiner Continuous Mining Deprecation

The most recent long-term support release of the Oracle Change Data Capture 19c, known as Oracle Database 19c. It brings in a number of improvements and features meant to boost manageability, security, and performance.

Oracle LogMiner is a program supplied by Oracle to Oracle Database purchasers that allows them to query logged database changes, mostly through SQL commands referencing data in Oracle redo logs. DBAs use it largely to query transaction logs. Oracle LogMiner includes a functionality that allows you to get real-time changes from Oracle redo logs. Most Third-Party CDC programs use this functionality to replicate Oracle data for their customers. Oracle has also deprecated this feature in version 19c in order to (you guessed it!) drive consumers to their paid service Oracle GoldenGate, which has significant license charges.

Oracle GoldenGate Replication

Oracle GoldenGate uses CDC to enable real-time, heterogeneous replication from sources. It’s a replication tool that comes as an Add-on license for Oracle’s Oracle Enterprise Edition Database or can be leased separately for non-Oracle Databases.

Oracle GoldenGate, on the other hand, may require the use of other Oracle-licensed products to provide efficient replication. To avoid Data Loss, Data Replication necessitates Table Construction on the destination based on table specifications at the source and effective data type mapping. Oracle Data Integrator (ODI), Oracle’s Extract, Transform, and Load (ETL) tool, is required for this.

Oracle GoldenGate will deliver CDC (Change Data Capture) replication in real-time, with incremental updates merged on the destination. CDC is known for its high throughput and low impact on source systems. Oracle GoldenGate for Big Data is required for big data targets such as Amazon S3 or Snowflake

Oracle Veridata is required to validate and reconcile data with the source. Oracle GoldenGate Studio will be required for a graphical user interface. Oracle GoldenGate Management Pack is required for quick alerting and monitoring capabilities (plugged into Oracle Enterprise Manager). Since all of these products must be licensed separately, using Oracle GoldenGate can be quite costly, not to mention the integration between them, which can add time and money to the process.

Oracle XStream

The real-time data integration feature Oracle XStream consists of the Capture, Propagate, and Apply components. However, Oracle has refocused on GoldenGate, a more complete solution with more platform compatibility and features.

This change suggests that users are urged to use GoldenGate for any upcoming data integration requirements. Moving to GoldenGate guarantees access to advanced capabilities and long-term compatibility with Oracle’s data integration strategy, even though XStrEAM may still be supported.

Oracle CDC RAC (Real Application Clusters) Replication

Oracle Real Application Clusters (RAC) is an Oracle Database Environment that is highly available and scalable. It enables customers to share storage and execute a single Oracle database across numerous machines. If one of the cluster’s servers fails, the Database instance will continue to function on the cluster’s remaining servers or nodes. Customers can also start small and scale up as needed by adding Nodes or Servers.

In an Oracle RAC setup, the fascinating part of Oracle CDC is that each Server/Node creates a unique Transaction Log, which is identified by a thread identifier and an incremental series. The Logs must be read and applied in the order in which the servers generate them, then stitched together. The Oracle CDC procedure becomes significantly more complicated as a result of this.

Oracle CDC for High Volume

Some Oracle setups follow patterns, with the majority of transactions arriving quickly for a few hours each day before returning to a more typical pace. The difficulty with processing in these circumstances is that the CDC process must be able to tolerate huge throughput spikes and scale appropriately so that replication occurs at the same rate as when the system is quieter. In these high-throughput contexts, most CDC solutions will experience significant lag when processing CDC.

Oracle CDC in a Catch-Up Scenario

Replication cannot be conducted when there is a problem accessing the Oracle Database due to network troubles or another issue, and there is a large backlog of transactions that must be duplicated after the problem is rectified. Catch-up must be done as though the CDC were working in the “High Throughput” scenario indicated above. Otherwise, Current Transactions may be severely delayed, if not impossible to catch up with! When you add in the complexity of a RAC environment, you’ve got yourself a potential data replication nightmare.

Remote Log Mining for Oracle CDC

This is a situation in which the Oracle Source System is overburdened. In this case, any additional activity, even the smallest load, could jeopardize the Oracle source Database’s Operating Performance. It’s possible that the sole alternative is to mine log files that aren’t connected to the Oracle Source Database. Since the logs can be mined on a different server, there is no load on the source.

Oracle CDC: Time Series Analytics

On the destination Data Lake or Data Warehouse, point-in-time queries are sometimes especially crucial for Analytics. The commit date from the Oracle Source, or the Timestamp that the records were saved to Oracle, rather than the Timestamp when they are loaded to the destination, is also essential in the SCD type2 history Timestamps. Any delays in the loading of data can have a negative impact on point-in-time analytics if the commit timestamp is not used in the SCD type 2 history creation.

Why does the process need to be accurate for Oracle CDC?

When an Application saves a Transaction, the change data for the underlying Oracle Tables are created. Maintaining referential integrity, i.e. bringing all the change data across for the transaction, is important when doing CDC. If only a portion of the data is replicated, it may cause referential integrity difficulties at the destination. Furthermore, even if the Oracle CDC process is error-free and delivers data with acceptable latency.

What to look for in an Oracle Replication Tool?

Hand-coding is much slower and more difficult than using an automated Oracle Replication Tool. It can be difficult to hand-code an Oracle CDC solution. Maintaining a hand-coded solution for future Oracle versions can also be problematic. As a result, employing an Oracle Replication Tool is a more straightforward option to consider.

Here are some things to think about while choosing an Oracle replication tool: 

  • Upgradation  to Oracle 19c 
  • Tool Operations with Oracle 12c and Oracle 19c 
  • High Throughput
  • Catch up and Recovery in the event of an Outage 
  • Enormous CDC volumes without difficulty 
  • Remote Work
  • Oracle Replication in Real-Time.

Benefits of Implementing Oracle CDC

  • Integrating Data in Real Time: Change Data Capture Oracle captures and delivers incremental changes made to Oracle tables, ensuring real-time data integration. Organizations may improve operational effectiveness, make well-informed choices, and preserve data consistency across systems using this capability. Businesses may achieve a competitive edge by utilizing Oracle’s change data capture to leverage the power of precise and up-to-date data integration.
  • Efficiency of Resources: Oracle CDC minimizes the requirement for complete data refreshes, lessens system impact, and enhances data operations to increase resource efficiency. It ensures that resources are used effectively and efficiently by simply capturing and delivering small changes. Organizations may optimize resource allocation while preserving data consistency and synchronization by utilizing Oracle’s change data capture. 
  • Data-Driven Decision-Making: CDC in Oracle database makes sure that decision-makers have access to the most recent information by recording and disseminating incremental changes in real-time. This increases agility and responsiveness by enabling organizations to make well-informed decisions based on timely information.

Limitations of Oracle CDC (Change Data Capture)

The limitations associated with Oracle CDC (Change Data Capture) are as follows:

  • Setting up Oracle CDC (Change Data Capture) requires numerous configuration and user permission changes to initialize and complete this process. Hence, it might be tough to implement for someone who does not have in-depth knowledge of Oracle.
  • The methods covered in this article are only about enabling and capturing change data. You still need to implement the logic for processing the change data and insert it into your target databases.
  • Implementing a complete Oracle CDC (Change Data Capture) Pipeline needs expert Oracle admin skills, target database skills, and programming skills to implement the logic. This would consume a considerable amount of engineering resources in development and maintenance.

Conclusion

This article helped you understand how you can set up Oracle CDC (Change Data Capture) with ease, including how to enable CDC in Oracle. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently.

Most businesses today, however, have an extremely high volume of data with a dynamic structure. Creating a Data Pipeline from scratch for such data is a complex process since businesses will have to utilize a high amount of resources to develop it and then ensure that it can keep up with the increased data volume and Schema variations. Businesses can instead use automated platforms like Hevo.

visit our website to explore hevo

Hevo Data helps you directly transfer data from a source of your choice to a Data Warehouse or desired destination in a fully automated and secure manner without having to write the code or export data repeatedly. It will make your life easier and make data migration hassle-free. It is User-Friendly, Reliable, and Secure.

SIGN UP for a 14-day free trial and see the difference!

mm
Software Engineer, Hevo Data

With around a decade of experience, Sarad has designed and developed fundamental components of Hevo. His expertise lies in building lean solutions for various software problems, mentoring fellow engineers and exploring new technologies"

No-code Data Pipeline For Oracle