Introduction

The purpose of this post is to introduce you to Oracle Streams concepts. You will learn about the various components that make up the Oracle Streams technology. Towards the end, you will find practical examples detailing how to implement Oracle Streams CDC in a production environment.

What is Oracle Streams? 

oracle streams : Oracle logo

Oracle Streams is a replication technology that enables real-time data integration across systems. The Streams feature was introduced in the Oracle database in version 9i. It allows you to track change data from the operational system(s) as they occur and then apply that in a staging database for further processing as part of an Extract/Transform/Load (ETL) process.

The Oracle Streams technology efficiently identifies and captures all data that has been changed during a transaction and who made them. It quickly identifies and processes only the changed data: Not entire tables. 

To guarantee transactional consistency for changes across multiple source tables, Oracle Streams leverages mature technologies such as:

  • Advanced Queuing
  • Log Miner
  • Triggers

Oracle assumes complete change management responsibility by applying implicit capture to automatically capture changes. Oracle Streams will capture and distribute database updates, events, and application messages. It will automatically apply updates to destination databases, or pass events and messages to custom procedures and applications.

Oracle Streams vs Traditional CDC

  • The major difference between Oracle Streams and traditional Change Data Capture is that CDC captures inserts, updates, and deletes made to tables while on the other hand Streams goes even further by capturing event capture, transport (transformation) and consumption.
  • This log-based Stream CDC happens synchronously or asynchronously and therefore it does not impact database performance.

Capture Configurations in Oracle Streams

There are 3 different change sets that guarantee transactionally consistent operations:

1. Synchronous Mode

  • Synchronous Capture uses triggers on the source database to capture change data. It has no latency because the change data is captured continuously and in real-time on the source database. The change tables are populated when DML operations on the source table are committed. Hence it is titled “Synchronous” Change Data Capture.
  • New change data arrives automatically as DML operations on the source tables are committed. Publishers can define new change sets in the predefined SYNC_SOURCE change source or use the predefined change set, SYNC_SET. The SYNC_SET change set cannot be altered or dropped.
  • While the “invasive” nature of the approach of the synchronous mode of Change Data Capture adds overhead to the source database at capture time, this mode can reduce costs (as compared to attempting to extract change data using table differencing or change-value section) by simplifying the extraction of change data. 

2. Asynchronous & Distribution HotLog

  • In Oracle Database 10g, CDC became asynchronous. Asynchronous capture records change data after the changes have been committed to the source database by using the database redo log files.
  • With HotLog asynchronous capture, a process on the source database picks up new transactions from a redo log file. There is a brief latency between the act of committing source table transactions and the arrival of change data.

New change data arrives automatically, on a transaction-by-transaction basis from the current online redo log file. Publishers define change sets in the predefined HOTLOG_SOURCE change source.

3. Asynchronous AutoLog

New change data arrives automatically, on a log-by-log basis, as log transport services make redo log files available. Redo transport services control the automated transfer of redo log files from the source database to the staging database. Publishers define change sets in publisher-defined AutoLog change sources.

ModeStaging LocationLatencyPerformance Impact
SynchronousMust be the same as the Source DatabaseNoneAdds overhead to Source Database transactions
Asynchronous HotLogMust be the same as the Source DatabasePopulated as new committed transactions arriveMinimal impact on Source Database for supplemental logging and capture
Asynchronous Distributed HotLogSplits between Source and Staging DatabasesPopulated as new committed transactions arriveMinimal impact on Source Database for supplemental logging and capture
Asynchronous AutoLogTypically remote from the Source DatabaseDepends on redo log switch frequencyWhen remote, it has the least impact on performance

How Synchronous CDC Works

Synchronous CDC uses internal triggers to capture before and/or after images of new and updated rows. The tables in your source database are modified to put triggers on, and the source database transactions will not commit until the changes are propagated over to the target table. It has the same performance implications as capture via user triggers.

How Asynchronous CDC Works

Asynchronous Change Data Capture generates components of Oracle Streams to capture change data and to populate change sets. These components must not be reused or shared for any other purposes. For example, the capture queue that Change Data Capture generates for a Distributed HotLog change source should not be used as the source queue for new user-created Streams propagations.

Messages in a staging area are consumed by the apply engine, where the changes they represent are applied to a database, or they are consumed by an application. The Oracle Streams apply engine works on the following:

  • DML changes
  • DDL changes

The apply engine runs locally on the system hosting the Oracle database. Asynchronous change data capture requires both the source and staging databases to be Oracle Database 10g Enterprise. The source database is where you apply the capture process. 

CDC Publish & Subscribe

The publisher supplies the change data and the subscribers consume change data. This model allows sharing of change data across users and applications.

This mechanism coordinates the retention/purge of change data. It prevents applications from accidentally processing change data more than once.

Oracle Streams Concepts

Before diving into the steps, let’s understand some concepts related to Oracle Streams.

  • Capture Process: Catches all real-time data changes in the source database system.
  • Change Set: Collection of source tables for which transactionally consistent change data is needed.
  • Change Table: It is the container that receives change data. It is published to subscribers.
  • Apply Process: Propagates captured data to the system.
  • Streams Administrator: It is a DB role used to perform data replication across Oracle databases. It can be assigned any name and privileges must be granted to the Stream admin user.
Ensure Data Integrity with Hevo’s No-code Data Pipeline

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Get Started with Hevo for Free

Steps to Set Up Oracle Streams CDC

For setting up Oracle Streams CDC, you can capture change data asynchronously using:

  1. HotLog Mode
  2. AutoLog Mode

In this post, you will learn how to set up Oracle Streams, redo logging, and Change Data Capture for asynchronous distributed HotLog publishing. Here is an illustration of the Stream components of an async distributed HotLog system.

These steps should be performed on both the source database and the staging database by the respective DBAs. This guide assumes that the source database DBA, the staging database DBA, and the publishers are four different people. The guide also assumes that you have Oracle Database release 11g.

Here is a high level overview of the process:

  • The source database DBA will:
    • Setup Oracle Net Services.
    • Initialize database parameters.
    • Enable force logging and supplemental logging on the source database.
    • Create the source database publisher.
  • The staging database DBA will:
    • Setup database initialization parameters.
    • Create a staging database publisher.
  • The source database publisher will then set up a database link from the source database to the staging database.
  • The staging database publisher will:
    • Set up a database link from the staging database to the source database.
    • Create new change sources, change sets, and the change tables.
    • Grant all necessary privileges to subscribers.

Let’s go through the steps in detail:

Step 1 — Source Database DBA: Prepare the Source Database

Set up the database initialization parameters on the source database.

compatible = 11.0
global_names = true
job_queue_processes = <current value> + 2
open_links = 4
parallel_max_servers = <current value> + 3
processes = <current value> + 4
sessions = <current value> + 1
streams_pool_size = <current value> + 20 MB
undo_retention = 3600

In this code snippet, you are specifying that the source database is Oracle Database release 11g, the number of change data sources as 1, and the value of the STREAMS_POOL_SIZE parameter to be 50 MB.

Step 2 — Staging Database DBA: Set the Database Initialization Parameters

The staging datbase DBA is going to initialize the database parameters as follows.

compatible = 11.0
global_names = true
java_pool_size = 50000000
open_links = 4
job_queue_processes = 2
parallel_max_servers = <current_value> + 2
processes = <current_value> + 3
sessions = <current value> + 1
streams_pool_size = <current_value> + 11 MB
undo_retention = 3600

Step 3 — Source Database DBA: Alter the Source Database

1. To do this, you first need to place the database in ARCHIVELOG mode by shutting down the database and then restarting it using the following commands:

shutdown immediate
startup mount
alter database archivelog;
alter database open;

2. Place the database into FORCE LOGGING logging mode.

ALTER DATABASE FORCE LOGGING;

3. Enable supplemental logging.

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

4. Create an unconditional log group on all columns to be captured in the source table. 

ALTER TABLE sh.contacts ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

In this example, you are capturing all rows in the sh.contacts table. The source database DBA will have to repeat this step for every source table for which change tables should be created.

Step 4 — Source Database DBA: Create and Grant Privileges to the Publisher

CREATE USER source_streamspub IDENTIFIED EXTERNALLY
   QUOTA UNLIMITED ON SYSTEM
   QUOTA UNLIMITED ON SYSAUX;
GRANT CREATE SESSION TO source_streamspub;
GRANT DBA TO source_streamspub;
GRANT CREATE DATABASE LINK TO source_streamspub;
GRANT EXECUTE on DBMS_CDC_PUBLISH TO source_streamspub;
GRANT EXECUTE_CATALOG_ROLE TO source_streamspub;
GRANT SELECT_CATALOG_ROLE TO source_streamspub;
EXECUTE DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE(
      GRANTEE=> 'source_streamspub');

The source database DBA creates a data publisher user (in this case, source_streamspub) and grants privileges to this user so that they have the necessary permissions required to set up a database link from the source database to the staging database.

Step 5 — Staging Database DBA: Create and Grant Privileges to the Publisher

CREATE USER staging_streamspub IDENTIFIED EXTERNALLY
   DEFAULT TABLESPACE ts_streamspub
   QUOTA UNLIMITED ON SYSTEM;
GRANT CREATE SESSION TO staging_streamspub;
GRANT CREATE TABLE TO staging_streamspub;
GRANT CREATE TABLESPACE TO staging_streamspub;
GRANT UNLIMITED TABLESPACE TO staging_streamspub;
GRANT SELECT_CATALOG_ROLE TO staging_streamspub;
GRANT EXECUTE_CATALOG_ROLE TO staging_streamspub;
GRANT CONNECT, RESOURCE, DBA TO staging_streamspub;
GRANT CREATE SEQUENCE TO staging_streamspub;
EXECUTE DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE(grantee => 'staging_streamspub');
GRANT CREATE DATABASE LINK TO staging_streamspub;

Here the staging database DBA creates a user, staging_streamspub, grants permissions and also creates change sets and change tables on the staging database.

Step 6 — Source Database Publisher: Create Database Link

CREATE DATABASE LINK staging_db
CONNECT TO staging_streamspub IDENTIFIED BY Stg406V5
USING 'staging_db'

In this example, you, the source database publisher, are creating a link from the source database to the staging database. 

Step 7 — Staging Database Publisher: Create Database Link

Create a link from the staging database to the source database. 

CREATE DATABASE LINK source_db
CONNECT TO source_streamspub IDENTIFIED BY Lvh256B2
USING 'source_db';

Step 8 — Staging Database Publisher: Identify the Change Source Database and Create the Change Sources

The following command creates the Distributed HotLog change sources on the source database. 

BEGIN
   DBMS_CDC_PUBLISH.CREATE_HOTLOG_CHANGE_SOURCE(
       change_source_name => 'MAILING_LIST',
       description => 'test source',
       source_database => 'source_db');
END;
/

Step 9 — Staging Database Publisher: Create Change Sets

The following command creates a change set called DAILY_MAILING_LIST that captures changes starting today and continues capturing change data indefinitely.

BEGIN
   DBMS_CDC_PUBLISH.CREATE_CHANGE_SET(
       change_set_name => 'DAILY_MAILING_LIST',
       description => 'change set for contact info',
       change_source_name => 'MAILING_LIST',
       stop_on_ddl => 'y');
END;
/

Step 10 — Staging Database Publisher: Create the Change Tables

The following command creates a change table on the staging database that captures changes made to the source table (sh.contacts) in the source database.

BEGIN
   DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE(
       owner             => 'staging_streamspub',
       change_table_name => 'contacts_ct',
       change_set_name   => 'DAILY_MAILING_LIST',
       source_schema     => 'SH',
       source_table      => 'CONTACTS',
       column_type_list  => 'PHONE_NUMBER NUMBER(10), CONTACT_NAME VARCHAR2(50),
          ZIP_CODE NUMBER(5),
          EMAIL_ADDRESS VARCHAR2(15), HOME_ADDRESS VARCHAR(20)',
       capture_values    => 'both',
       rs_id             => 'y',
       row_id            => 'n',
       user_id           => 'n',
       timestamp         => 'n',
       object_id         => 'n',
       source_colmap     => 'n',
       target_colmap     => 'y',
       options_string    => 'TABLESPACE TS_DAILY_MAILING_LIST');
END;
/

Step 11 — Staging Database Publisher: Enable the Change Source

Alter the change source to enable distributed HotLogs. 

BEGIN
  DBMS_CDC_PUBLISH.ALTER_HOTLOG_CHANGE_SOURCE(
    change_source_name => 'DAILY_MAILING_LIST',
    enable_source      => 'Y');
END;
/

Now, alter the change source to enable asynchronous change sets. 

BEGIN
   DBMS_CDC_PUBLISH.ALTER_CHANGE_SET(
       change_set_name => 'DAILY_MAILING_LIST',
       enable_capture  => 'y');
END;
/

Step 12 — Staging Database Publisher: Grant Access to Subscribers

The publisher should grant a subscriber access to change data so that they can create subscriptions. The following command assumes that the user subscriber1 already exists.

GRANT SELECT ON staging_cdcpub.products_ct TO subscriber1;

Conclusion

You’ve learnt how to capture change data from one Oracle Database into another using Oracle Streams. You learnt how to set up redo logging and asynchronous distributed HotLog publishing.

Check out Oracle Streams Concepts and Administration for more information on Oracle Streams.

While Oracle Streams provides a platform for asynchronous CDC for your Oracle database, it is quite effort-intensive and requires in-depth technical expertise in working with the Oracle database. If you want to automate the process of CDC in your Oracle database, Hevo Data is the best product for you.

It will help you save on engineering bandwidth and costs while seamlessly enabling Oracle CDC without the requirement of any manual intervention.

Want to take Hevo for a spin? Sign Up or a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also checkout our unbeatable pricing to choose the best plan for your organization.

Vivek Sinha
Director of Product Management, Hevo Data

Vivek Sinha is a seasoned product leader with over 10 years of expertise in revolutionizing real-time analytics and cloud-native technologies. He specializes in enhancing Apache Pinot, focusing on query processing and data mutability. Vivek is renowned for his strategic vision and ability to deliver cutting-edge solutions that empower businesses to harness the full potential of their data.

No-code Data Pipeline for your data warehouse