Data is now considered to be one of the most valuable assets of any organization. It makes transactions within a business easier and facilitates a smooth flow of operations. With organizations relying on evidence-based decision-making more than ever before, data also acts as a key decision-making tool. Almost all companies today, irrespective of their market position or size, leverage the data collected to analyze their business & customers, and thereby make smarter and informed business decisions that help drive business growth and profitability.
Hence, businesses are constantly looking for ways to manage their data better. One of the most widely used practices is Change Data Capture. This article will help you understand what Change Data Capture (CDC) is and how you can set up Oracle CDC (Change Data Capture).
Table of Contents
- What is Oracle
- Key Features of Oracle
- What is Oracle CDC (Change Data Capture)
- Oracle CDC using Transaction Logs (Log-based CDC)
- The Shift from Trigger-Based to Log-Based Oracle CDC
- Change Sources and Modes of Change Data Capture
- Oracle Replication Types with CDC
- Transaction Logs in Oracle CDC
- Oracle CDC 12c: Oracle Streams Deprecation
- Oracle CDC 19c: Oracle LogMiner Continuous Mining Deprecation
- Oracle GoldenGate Replication
- Oracle CDC RAC (Real Application Clusters) Replication
- Oracle CDC for High Volume
- Oracle CDC in a Catch-Up Scenario
- Remote Log Mining for Oracle CDC
- Oracle CDC: Time Series Analytics
- Why does the process need to be accurate for Oracle CDC?
- What to look for in an Oracle Replication Tool?
- Limitations of Oracle CDC (Change Data Capture)
What is Oracle
Oracle is a Relational Database Management System (RDBMS) from Oracle Corporation. Oracle’s Relational Database Management System (RDBMS) is a multi-model system, that was primarily designed for Data Warehousing and Enterprise Grid Computing. It is now the database of choice for almost all Enterprises due to its cost-effective solution for data management. Oracle houses support for Structured Query Language (SQL), allowing users to easily interact with the database.
Oracle offers five different editions of its RDBMS from which users can choose based on their business and data requirements. These editions are as follows:
- Express Edition: This is an entry-level edition that is available free of cost for everyone to download, install, manage, develop, deploy, etc.
- Standard Edition One: This edition is considered to be suitable for business applications that require a single-server or are highly branched but require limited features.
- Standard Edition: In addition to all features in Standard Edition One, the Standard Edition houses support for Oracle Real Application Clustering Service and larger machines.
- Enterprise Edition: The Enterprise Edition offers numerous advanced features such as improved performance, advanced security, availability, and scalability. These features are required for most critical applications and business processes in which online transaction processing is involved.
- Personal Edition: Except for Oracle Real Application Clustering, the Personal Edition houses all features of the Enterprise Edition.
More information on Oracle can be found here.
Key Features of Oracle
The key features of Oracle are as follows:
- Client/Server Architecture: Oracle is based on Distributed Systems that have the same degree of data consistency and transparency as any Non-distributed Systems along with all advantages of a local Database Management System. Oracle takes full advantage of this distributed architecture by implementing a Client/Server Model. Hence, Oracle allows processing to be split into Client and Server application programs.
- Portability: Oracle can easily be ported to work on a wide variety of Operating Systems. Hence, any application developed using Oracle can be ported to any other Operating System as per requirements without making any changes to it.
- Scalability and Performance: Oracle houses numerous features like advanced Portability and Real Application Clustering that make it highly scalable. It also gives users the ability to tune the performance of their database as per requirements, allowing it to retrieve and alter data faster, thereby improving query execution time along with application operations.
- Availability: All real-time applications now require high Data Availability. Since Oracle is built on high-performing computing environments, it can be configured easily to ensure all-time Data Availability. This ensures that the data is accessible during planned, unplanned failures or downtimes.
- Backup and Recovery: Oracle houses robust recovery features that enable users to recover data from almost all kinds of database failures. In case of a failure, the database is recovered in no time, thereby ensuring high availability. Also, the unaffected parts of the database are still available even while the affected ones are getting recovered.
- Security: Oracle is considered to be the database of choice for most Enterprises and hence, stores a lot of sensitive information. The security of this data is always the top priority for obvious reasons. Oracle houses advanced in-built mechanisms that can be used to control and monitor data access and usage. It can be used to implement strong authorization policies that can help prevent unauthorized access to the data and only allow distinct access to the users as per requirements.
- Database Integrity: Oracle can ensure the consistency of all the data stored by enforcing Data Integrity. This saves a lot of engineering resources that would have been otherwise utilized in managing integrity checks.
- Concurrent Processing: Oracle allows a large number of users to execute a variety of applications concurrently.
Simplify Oracle ETL with Hevo’s No-code Data Pipeline
Hevo is a No-code Data Pipeline that offers a fully managed solution to set up data integration from Oracle and 100+ data sources (including 30+ free data sources) to numerous Business Intelligence tools, Data Warehouses, or a destination of choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully-automated solution to manage data in real-time and always have analysis-ready data.GET STARTED WITH HEVO FOR FREE
Let’s look at Some Salient Features of Hevo:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
What is Oracle CDC (Change Data Capture)
The rise in popularity of Open-Source Distributed Data Warehouse systems has led to a scenario where a combination of Oracle-based transactional database and a separate Data Warehouse is a common sight in the Enterprise ETL stack.
A common challenge found in such cases is in implementing continuous real-time sync between Oracle and the Data Warehouse so that reports always have the most recent data possible. This is accomplished using the Oracle CDC (Change Data Capture) paradigm.
Oracle CDC (Change Data Capture) is a technology that identifies and captures data added to, updated, and deleted from Oracle tables making it available for applications to consume. Oracle explains Oracle CDC (Change Data Capture) based on the concept of publishers and subscribers. Publishers are Oracle users who make the changes and Subscribers are individuals or applications that observe the changed data.
Oracle CDC using Transaction Logs (Log-based CDC)
Log-based CDC is the best Oracle CDC method. This involves minimal load on the Oracle source system and can replicate all the changes accurately. The Oracle CDC method uses transaction logs, mines the Oracle redo-logs, and the archive logs which helps Oracle recover on its own during a failure. After the logs are mined, the transactions can be applied on the destination similar to the source. This provides a real-time view of the data on the destination. All these data can be accessed on the destination or the target. Oracle CDC with transactional logs works on all Oracle versions including 12c and 19c.
The Shift from Trigger-Based to Log-Based Oracle CDC
Oracle 9i initially provided users of its database with a built-in mechanism for tracking and storing changes just like native CDC. The major drawback of this initial version of Oracle CDC was that it depended on the placement of triggers on tables in the source database. So, DBAs were reluctant to use this technology. Oracle later released Oracle 10g with database redo logs and a built-in Oracle replication tool that is used in capturing and transmitting data changes without invasive database triggers.
The new form of Oracle CDC was a log-based form of CDC that was lightweight and did not require changing the structure of source tables. Though it has been a popular feature of the database, Oracle no longer supports CDC with the release of Oracle 12c. This is done to encourage users to switch to the Oracle replication software solution called Oracle GoldenGate.
Change Sources and Modes of Change Data Capture
Change Data Capture comes with synchronous and asynchronous modes for capturing change data. In this section, you will understand how each mode of Change Data Capture is performed.
What is Synchronous Oracle CDC (Change Data Capture)
A Synchronous Oracle CDC (Change Data Capture) is implemented using triggers that inserts entries to a Change Capture table every time a data modification operation is executed. Triggers are hooks that get executed as part of the transactions. The first step is creating a user to serve as the Change Data Publisher. This user must have all access to the namespace and the table from which change data is to be captured.
Assuming this user is created, you will now go to the core part of this process, i.e., creating a Change Set, Change Table, and subscribing to changes.
Steps to Set up Synchronous Oracle CDC (Change Data Capture)
Users can set up Synchronous Oracle CDC (Change Data Capture) by implementing the following steps:
- Step 1: Creating a Change Set
- Step 2: Creating a Change Table
- Step 3: Creating a Subscription to Capture Data
- Step 4: Subscribing to Changes on Source Table
- Step 5: Activating the Subscription
- Step 6: Loading Data from Subscribed Table
- Step 7: Using Select Query to Load Changes
Step 1: Creating a Change Set
To create a change set, users can execute the following command in Oracle:
BEGIN DBMS_CDC_PUBLISH.CREATE_CHANGE_SET( change_set_name => 'PRODUCT_SET', description => 'Change set for product change info', change_source_name => 'SYNC_SOURCE'); END;
In the above query, SYNC_SOURCE is a predefined source defined by Oracle to implement Synchronous Oracle CDC (Change Data Capture).
Step 2: Creating a Change Table
Using the Change Set defined previously, create a Change Table that will capture information from the Triggers by executing the following query:
BEGIN DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE( owner => 'cdcuser', change_table_name => 'products_ct', change_set_name => 'PRODUCT_SET', source_schema => 'SH', source_table => 'PRODUCTS', column_type_list => 'ID NUMBER(6), NAME VARCHAR2(50), PRICE NUMBER(8,2)', capture_values => 'both', rs_id => 'y', row_id => 'n', user_id => 'n', timestamp => 'n', object_id => 'n', source_colmap => 'y', target_colmap => 'y', options_string => 'TABLESPACE TS_PRODUCT'); END;
In the above command, column_type_list attribute captures the list of columns for which changes will be captured.
Step 3: Creating a Subscription to Capture Data
The next step is to create a Subscription to capture the data that is being inserted to the Change Table. This can be done by executing the following query:
BEGIN DBMS_CDC_SUBSCRIBE.CREATE_SUBSCRIPTION( change_set_name => 'PRODUCT_SET', description => 'Change data for PRODUCTS', subscription_name => 'PRODUCT_SUB'); END;
Step 4: Subscribing to Changes on Source Table
Subscribe to the source table changes using the subscription object created in the previous step by executing the following query:
BEGIN DBMS_CDC_SUBSCRIBE.SUBSCRIBE( subscription_name => 'PRODUCT_SUB', source_schema => 'SH', source_table => 'PRODUCTS', column_list => 'ID, NAME,PRICE', subscriber_view => 'PRODUCT_VIEW'); END;
Step 5: Activating the Subscription
You can run the following query to activate the subscription:
BEGIN DBMS_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION( subscription_name => 'PRODUCT_SUB'); END;
Step 6: Loading Data from Subscribed Table
Loading data from the subscribed table is done based on a collection window for the next available set of change data. Every time the following command is called, new change data that was created since the last call of this command is added to the subscription window.
BEGIN DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW( subscription_name => 'PRODUCT_SUB'); END;
Step 7: Using Select Query to Load Changes
You can now use a select query to get the new changes and process them as per the target database.
SELECT ID, NAME, PRICE FROM SALES_VIEW;
To copy data to another destination database, the above action will have to be performed using a script that has the ability to process these records and insert to the target database.
The biggest disadvantage with the above approach is that Triggers affect the performance of the source database. For cases where this cannot be afforded, Oracle provides another alternative to implement Oracle CDC (Change Data Capture).
Understanding Asynchronous Oracle CDC (Change Data Capture)
Asynchronous Oracle CDC (Change Data Capture) is implemented using Redo Logs. Redo Logs are logs that keep a track of all activities in a database. As the name suggests, this is an asynchronous process and does not cause any performance hit.
Steps to Set up Asynchronous Oracle CDC (Change Data Capture)
Users can set up Asynchronous Oracle CDC (Change Data Capture) by implementing the following steps:
- Step 1: Configuring Database
- Step 2: Enabling Supplemental Logging
- Step 3: Enabling Logging for Columns
- Step 4: Preparing Source Tables
- Step 5: Creating Change Set
- Step 6: Creating Change Table
- Step 7: Enabling Change Set
Step 1: Configuring Database
The first step is to enable FORCE LOGGING mode on your database. You can do so by executing the following query:
ALTER DATABASE FORCE LOGGING;
Step 2: Enabling Supplemental Logging
Supplemental Logging allows the database to create entries in the log for all changes to the table including UPDATE statements. It can be enabled by executing the following query:
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
Step 3: Enabling Logging for Columns
First, you would need to identify the columns that you need to capture changes from. If you choose to capture changes from all columns, you can run the following query:
ALTER TABLE ts.products ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
Step 4: Preparing Source Tables
You can prepare source tables for Oracle CDC (Change Data Capture) by executing the following query:
BEGIN DBMS_CAPTURE_ADM.PREPARE_TABLE_INSTANTIATION(TABLE_NAME => 'ts.products'); END;
Step 5: Creating Change Set
You can create a Change Set with the source as HOTLOG_SOURCE by executing the following query:
BEGIN DBMS_CDC_PUBLISH.CREATE_CHANGE_SET( change_set_name => 'PRODUCT_SET', description => 'Change set for product info', change_source_name => 'HOTLOG_SOURCE', stop_on_ddl => 'y', begin_date => sysdate, end_date => sysdate+5); END;
HOTLOG_SOURCE is a predefined source defined by Oracle for Asynchronous Oracle CDC (Change Data Capture).
Step 6: Creating Change Table
You can create a Change Table by executing the following query:
BEGIN DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE( owner => 'cdcpub', change_table_name => 'products_ct', change_set_name => 'PRODUCT_SET', source_schema => 'TS', source_table => 'PRODUCTS', column_type_list => 'ID NUMBER(6), NAME VARCHAR2(50), PRICE NUMBER(8,2)', capture_values => 'both', rs_id => 'y', row_id => 'n', user_id => 'n', timestamp => 'n', object_id => 'n', source_colmap => 'n', target_colmap => 'y', options_string => 'TABLESPACE TS_PRODUCT_SET'); END;
Step 7: Enabling Change Set
Finally, you have to enable the Change Set for Oracle CDC (Change Data Capture) by executing the following query:
BEGIN DBMS_CDC_PUBLISH.ALTER_CHANGE_SET( change_set_name => 'PRODUCT_SET', enable_capture => 'y'); END;
That’s it, you now have set up Oracle CDC (Change Data Capture) for your tables which means that all the changed data will be captured in the changing table. However, you still need to implement a mechanism that creates a subscription to this table and process accordingly.
Oracle Replication Types with CDC
Oracle CDC using Timestamps
Audit Timestamps can be used to replicate changed records because they record the Date and Time when a record was entered or updated.
- Since deleted records cannot be detected, this technique is unsuitable for most applications.
- This method has a high overhead because queries must be conducted continuously on the source Oracle Database and tables.
- The Source Application may not reliably capture Audit Dates and Timestamp Columns, resulting in improper replication. For the CDC, there are concerns with Timestamps and Triggers.
On the tables that will be replicated, Triggers can be defined. These are triggered by every insert, update, and deletion, and the Timestamps can be gathered into another change table. The Audit Table can then be used for replication.
Triggers also increase the overheads of the Oracle Database by putting a burden on it.
Transaction Logs in Oracle CDC
The best Oracle CDC approach is log-based CDC, which places a minimal burden on the Oracle source system and accurately replicates all changes. Oracle’s CDC technique, which uses Transaction Logs, mines Oracle redo-logs and/or archive logs for its own recovery in the event of a failure. When the logs are mined, transactions can be applied to the destination in the same way that they were applied to the source, giving the destination a real-time view of the data. This data can be retrieved on the goal or destination, which is typically a Data Lake, Data Warehouse, or Analytical Appliance. On ADLS Gen2, create an Azure Data Lake.
Data may then be combined across numerous Enterprise Systems and Applications to produce rich insights that can be used to uncover significant business opportunities, improve operational efficiencies, spot key business trends, and gain a competitive edge. Since, it only requires access to the Oracle redo and archive logs, Oracle CDC with transactional logs works on all Oracle Versions, including 12c and 19c. The format of these, however, can vary significantly between Oracle Versions.
Oracle CDC 12c: Oracle Streams Deprecation
Oracle Streams was a native CDC utility for Oracle Databases that was available for free and could be used for real-time data transport. Oracle Streams can be used to automatically push Transactional Updates to a queue that can then be replicated. Oracle Streams is no longer supported as of Oracle 12c. Oracle is encouraging consumers to upgrade to Oracle GoldenGate, a premium version with substantial licensing fees.
Oracle CDC 19c: Oracle LogMiner Continuous Mining Deprecation
Oracle LogMiner is a program supplied by Oracle to Oracle Database purchasers that allows them to query logged database changes, mostly through SQL commands referencing data in Oracle redo logs. DBAs use it largely to query transaction logs. Oracle LogMiner includes a functionality that allows you to get real-time changes from Oracle redo logs. Most Third-Party CDC programs use this functionality to replicate Oracle data for their customers. Oracle has also deprecated this feature in version 19c in order to (you guessed it!) drive consumers to their paid service Oracle GoldenGate, which has significant license charges.
Oracle GoldenGate Replication
CDC is used by Oracle GoldenGate to enable real-time, heterogeneous replication from sources. It’s a replication tool that comes as an Add-on license for Oracle’s Oracle Enterprise Edition Database or can be leased separately for non-Oracle Databases.
Oracle GoldenGate, on the other hand, may require the use of other Oracle licenced products in order to provide efficient replication. To avoid Data Loss, Data Replication necessitates Table Construction on the destination based on table specifications at the source, as well as effective data type mapping. Oracle Data Integrator (ODI), Oracle’s Extract, Transform, and Load (ETL) tool, is required for this.
Oracle GoldenGate will deliver CDC (Change Data Capture) replication in real-time, with incremental updates merged on the destination. CDC is known for its high throughput and low impact on source systems. Oracle GoldenGate for Big Data is required for big data targets such as Amazon S3 or Snowflake.
Oracle Veridata is required to validate and reconcile data with the source. Oracle GoldenGate Studio will be required for a graphical user interface. Oracle GoldenGate Management Pack is required for quick alerting and monitoring capabilities (plugged into Oracle Enterprise Manager). Since all of these products must be licensed separately, using Oracle GoldenGate can be quite costly, not to mention the integration between them, which can add time and money to the process.
Oracle CDC RAC (Real Application Clusters) Replication
Oracle Real Application Clusters (RAC) is an Oracle Database Environment that is highly available and scalable. It enables customers to share storage and execute a single Oracle Ddatabase across numerous machines. If one of the cluster’s servers fails, the Database instance will continue to function on the cluster’s remaining servers or nodes. Customers can also start small and scale up as needed by adding Nodes or Servers.
In an Oracle RAC setup, the fascinating part of Oracle CDC is that each Server/Node creates a unique Transaction Log, which is identified by a thread identifier and an incremental series. The Logs must be read and applied in the order in which the servers generate them, then stitched together. The Oracle CDC procedure becomes significantly more complicated as a result of this.
Oracle CDC for High Volume
Some Oracle setups follow patterns, with the majority of transactions arriving quickly for a few hours each day before returning to a more typical pace. The difficulty with processing in these circumstances is that the CDC process must be able to tolerate huge throughput spikes and scale appropriately so that replication occurs at the same rate as when the system is quieter. In these high-throughput contexts, most CDC solutions will experience significant lag when processing CDC.
Oracle CDC in a Catch-Up Scenario
Replication cannot be conducted when there is a problem accessing the Oracle Database due to network troubles or another issue, and there is a large backlog of transactions that must be duplicated after the problem is rectified. Catch-up must be done as though the CDC were working in the “High Throughput” scenario indicated above. Otherwise, Current Transactions may be severely delayed, if not impossible to catch up with! When you add in the complexity of a RAC environment, you’ve got yourself a potential data replication nightmare.
Remote Log Mining for Oracle CDC
This is a situation in which the Oracle Source System is overburdened. In this case, any additional activity, even the smallest load, could jeopardize the Oracle source Database’s Operating Performance. It’s possible that the sole alternative is to mine log files that aren’t connected to the Oracle Source Database. Since the logs can be mined on a different server, there is no load on the source.
Oracle CDC: Time Series Analytics
On the destination Data Lake or Data Warehouse, point-in-time queries are sometimes especially crucial for Analytics. The commit date from the Oracle Source, or the Timestamp that the records were saved to Oracle, rather than the Timestamp when they are loaded to the destination, is also essential in the SCD type2 history Timestamps. Any delays in the loading of data can have a negative impact on point-in-time analytics if the commit timestamp is not used in the SCD type 2 history creation.
Why does the process needs to be accurate for Oracle CDC?
When an Application saves a Transaction, the change data for the underlying Oracle Tables are created. Maintaining referential integrity, i.e. bringing all the change data across for the transaction, is important when doing CDC. If only a portion of the data is replicated, it may cause referential integrity difficulties at the destination. Furthermore, even if the Oracle CDC process is error-free and delivers data with acceptable latency.
What to look for in an Oracle Replication Tool?
Hand-coding is much slower and more difficult than using an automated Oracle Replication Tool. It can be difficult to hand-code an Oracle CDC solution. Maintaining a hand-coded solution for future Oracle versions can also be problematic. As a result, employing an Oracle Replication Tool is a more straightforward option to consider.
Here are some things to think about while choosing an Oracle replication tool:
- Upgradation to Oracle 19c
- Tool Operations with Oracle 12c and Oracle 19c
- High Throughput
- Catch up and Recovery in the event of an Outage
- Enormous CDC volumes without difficulty
- Remote Work
- Oracle Replication in Real-Time.
Limitations of Oracle CDC (Change Data Capture)
The limitations associated with Oracle CDC (Change Data Capture) are as follows:
- Setting up Oracle CDC (Change Data Capture) requires numerous configuration and user permission changes to initialize and complete this process. Hence, it might be tough to implement for someone who does not have an in-depth knowledge of Oracle.
- The methods covered in this article are only about enabling and capturing change data. You still need to implement the logic for processing the change data and insert it into your target databases.
- Implementing a complete Oracle CDC (Change Data Capture) Pipeline needs expert Oracle admin skills, target database skills, and programming skills to implement the logic. This would consume a considerable amount of engineering resources in development and maintenance.
This article helped you understand how you can set up Oracle CDC (Change Data Capture) with ease. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently.
Most businesses today, however, have an extremely high volume of data with a dynamic structure. Creating a Data Pipeline from scratch for such data is a complex process since businesses will have to utilize a high amount of resources to develop it and then ensure that it can keep up with the increased data volume and Schema variations. Businesses can instead use automated platforms like Hevo.visit our website to explore hevo
Hevo helps you directly transfer data from a source of your choice to a Data Warehouse or desired destination in a fully automated and secure manner without having to write the code or export data repeatedly. It will make your life easier and make data migration hassle-free. It is User-Friendly, Reliable, and Secure.
SIGN UP for a 14-day free trial and see the difference!