Oracle Streams CDC: Detailed Guide

on Data Integration, Tutorials • June 11th, 2020 • Write for Hevo

Introduction

The purpose of this post is to introduce you to Oracle Streams concepts. You will learn about the various components that make up the Oracle Streams technology. Towards the end, you will find practical examples detailing how to implement Oracle Streams CDC in a production environment.

This post will take you through the following aspects in detail.

What is Oracle Streams? 

Oracle Streams is a replication technology that enables real-time data integration across systems. The Streams feature was introduced in the Oracle database in version 9i. It allows you to track change data from the operational system(s) as they occur and then apply that in a staging database for further processing as part of an Extract/Transform/Load (ETL) process. The Oracle Streams technology efficiently identifies and captures all data that has been changed during a transaction and who made them. It quickly identifies and processes only the changed data: Not entire tables. 

To guarantee transactional consistency for changes across multiple source tables, Oracle Streams leverages mature technologies such as:

  • Advanced Queuing
  • Log Miner
  • Triggers

Oracle assumes complete change management responsibility by applying implicit capture to automatically capture changes. Oracle Streams will capture and distribute database updates, events, and application messages. It will automatically apply updates to destination databases, or pass events and messages to custom procedures and applications.

Oracle Streams vs Traditional CDC

The major difference between Oracle Streams and traditional Change Data Capture is that CDC captures inserts, updates, and deletes made to tables while on the other hand Streams goes even further by capturing event capture, transport (transformation) and consumption.

This log-based Stream CDC happens synchronously or asynchronously and therefore it does not impact database performance.

A Simpler Alternative to Oracle Streams CDC – Hevo Data

Hevo Data, a No-code Data Pipeline, provides you a platform to connect your Oracle database to any data warehouse. It helps you migrate data in real time and provides a state-of-the-art infrastructure to ensure reliable change data capture. Some of the salient features of Hevo include:

  1. Fast Implementation
  2. Fully Managed Service & Live Support
  3. Real-time Data Transfer
  4. 100% Complete & Accurate Data Transfer
  5. Scalable Infrastructure
  6. A Wide Array of Data Source Integrations

Get started with Hevo by signing up for a free 14-day trial!

Capture Configurations in Oracle Streams

There are 3 different change sets that guarantee transactionally consistent operations:

1. Synchronous Mode

Synchronous Capture uses triggers on the source database to capture change data. It has no latency because the change data is captured continuously and in real-time on the source database. The change tables are populated when DML operations on the source table are committed. Hence it is titled “Synchronous” Change Data Capture.

New change data arrives automatically as DML operations on the source tables are committed. Publishers can define new change sets in the predefined SYNC_SOURCE change source or use the predefined change set, SYNC_SET. The SYNC_SET change set cannot be altered or dropped.

While the “invasive” nature of the approach of the synchronous mode of Change Data Capture adds overhead to the source database at capture time, this mode can reduce costs (as compared to attempting to extract change data using table differencing or change-value section) by simplifying the extraction of change data. 

2. Asynchronous & Distribution HotLog

In Oracle Database 10g, CDC became asynchronous. Asynchronous capture records change data after the changes have been committed to the source database by using the database redo log files. With HotLog asynchronous capture, a process on the source database picks up new transactions from a redo log file. There is a brief latency between the act of committing source table transactions and the arrival of change data.

New change data arrives automatically, on a transaction-by-transaction basis from the current online redo log file. Publishers define change sets in the predefined HOTLOG_SOURCE change source.

3. Asynchronous AutoLog

New change data arrives automatically, on a log-by-log basis, as log transport services make redo log files available. Redo transport services control the automated transfer of redo log files from the source database to the staging database. Publishers define change sets in publisher-defined AutoLog change sources.

ModeStaging LocationLatencyPerformance Impact
SynchronousMust be the same as the Source DatabaseNoneAdds overhead to Source Database transactions
Asynchronous HotLogMust be the same as the Source DatabasePopulated as new committed transactions arriveMinimal impact on Source Database for supplemental logging and capture
Asynchronous Distributed HotLogSplits between Source and Staging DatabasesPopulated as new committed transactions arriveMinimal impact on Source Database for supplemental logging and capture
Asynchronous AutoLogTypically remote from the Source DatabaseDepends on redo log switch frequencyWhen remote, it has the least impact on performance

How Synchronous CDC Works

Synchronous CDC uses internal triggers to capture before and/or after images of new and updated rows. The tables in your source database are modified to put triggers on, and the source database transactions will not commit until the changes are propagated over to the target table. It has the same performance implications as capture via user triggers.

How Asynchronous CDC Works

Asynchronous Change Data Capture generates components of Oracle Streams to capture change data and to populate change sets. These components must not be reused or shared for any other purposes. For example, the capture queue that Change Data Capture generates for a Distributed HotLog change source should not be used as the source queue for new user-created Streams propagations.

Messages in a staging area are consumed by the apply engine, where the changes they represent are applied to a database, or they are consumed by an application. The Oracle Streams apply engine works on the following:

  • DML changes
  • DDL changes

The apply engine runs locally on the system hosting the Oracle database. Asynchronous change data capture requires both the source and staging databases to be Oracle Database 10g Enterprise. The source database is where you apply the capture process. 

CDC Publish & Subscribe

The publisher supplies the change data and the subscribers consume change data. This model allows sharing of change data across users and applications.

This mechanism coordinates the retention/purge of change data. It prevents applications from accidentally processing change data more than once.

Oracle Streams Concepts

Before diving into the steps, let’s understand some concepts related to Oracle Streams.

  • Capture Process: Catches all real-time data changes in the source database system.
  • Change Set: Collection of source tables for which transactionally consistent change data is needed.
  • Change Table: It is the container that receives change data. It is published to subscribers.
  • Apply Process: Propagates captured data to the system.
  • Streams Administrator: It is a DB role used to perform data replication across Oracle databases. It can be assigned any name and privileges must be granted to the Stream admin user.

Steps to Set Up Oracle Streams CDC

For setting up Oracle Streams CDC, you can capture change data asynchronously using:

  1. HotLog Mode
  2. AutoLog Mode

In this post, you will learn how to set up Oracle Streams, redo logging, and Change Data Capture for asynchronous distributed HotLog publishing. Here is an illustration of the Stream components of an async distributed HotLog system.

Oracle Streams: Async HotLog CDC

These steps should be performed on both the source database and the staging database by the respective DBAs. This guide assumes that the source database DBA, the staging database DBA, and the publishers are four different people. The guide also assumes that you have Oracle Database release 11g.

Here is a high level overview of the process:

  • The source database DBA will:
    • Setup Oracle Net Services.
    • Initialize database parameters.
    • Enable force logging and supplemental logging on the source database.
    • Create the source database publisher.
  • The staging database DBA will:
    • Setup database initialization parameters.
    • Create a staging database publisher.
  • The source database publisher will then set up a database link from the source database to the staging database.
  • The staging database publisher will:
    • Set up a database link from the staging database to the source database.
    • Create new change sources, change sets, and the change tables.
    • Grant all necessary privileges to subscribers.

Let’s go through the steps in detail:

Step 1 — Source Database DBA: Prepare the Source Database

Set up the database initialization parameters on the source database.

compatible = 11.0
global_names = true
job_queue_processes = <current value> + 2
open_links = 4
parallel_max_servers = <current value> + 3
processes = <current value> + 4
sessions = <current value> + 1
streams_pool_size = <current value> + 20 MB
undo_retention = 3600

In this code snippet, you are specifying that the source database is Oracle Database release 11g, the number of change data sources as 1, and the value of the STREAMS_POOL_SIZE parameter to be 50 MB.

Step 2 — Staging Database DBA: Set the Database Initialization Parameters

The staging datbase DBA is going to initialize the database parameters as follows.

compatible = 11.0
global_names = true
java_pool_size = 50000000
open_links = 4
job_queue_processes = 2
parallel_max_servers = <current_value> + 2
processes = <current_value> + 3
sessions = <current value> + 1
streams_pool_size = <current_value> + 11 MB
undo_retention = 3600

Step 3 — Source Database DBA: Alter the Source Database

1. To do this, you first need to place the database in ARCHIVELOG mode by shutting down the database and then restarting it using the following commands:

shutdown immediate
startup mount
alter database archivelog;
alter database open;

2. Place the database into FORCE LOGGING logging mode.

ALTER DATABASE FORCE LOGGING;

3. Enable supplemental logging.

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

4. Create an unconditional log group on all columns to be captured in the source table. 

ALTER TABLE sh.contacts ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

In this example, you are capturing all rows in the sh.contacts table. The source database DBA will have to repeat this step for every source table for which change tables should be created.

Step 4 — Source Database DBA: Create and Grant Privileges to the Publisher

CREATE USER source_streamspub IDENTIFIED EXTERNALLY
   QUOTA UNLIMITED ON SYSTEM
   QUOTA UNLIMITED ON SYSAUX;
GRANT CREATE SESSION TO source_streamspub;
GRANT DBA TO source_streamspub;
GRANT CREATE DATABASE LINK TO source_streamspub;
GRANT EXECUTE on DBMS_CDC_PUBLISH TO source_streamspub;
GRANT EXECUTE_CATALOG_ROLE TO source_streamspub;
GRANT SELECT_CATALOG_ROLE TO source_streamspub;
EXECUTE DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE(
      GRANTEE=> 'source_streamspub');

The source database DBA creates a data publisher user (in this case, source_streamspub) and grants privileges to this user so that they have the necessary permissions required to set up a database link from the source database to the staging database.

Step 5 — Staging Database DBA: Create and Grant Privileges to the Publisher

CREATE USER staging_streamspub IDENTIFIED EXTERNALLY
   DEFAULT TABLESPACE ts_streamspub
   QUOTA UNLIMITED ON SYSTEM;
GRANT CREATE SESSION TO staging_streamspub;
GRANT CREATE TABLE TO staging_streamspub;
GRANT CREATE TABLESPACE TO staging_streamspub;
GRANT UNLIMITED TABLESPACE TO staging_streamspub;
GRANT SELECT_CATALOG_ROLE TO staging_streamspub;
GRANT EXECUTE_CATALOG_ROLE TO staging_streamspub;
GRANT CONNECT, RESOURCE, DBA TO staging_streamspub;
GRANT CREATE SEQUENCE TO staging_streamspub;
EXECUTE DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE(grantee => 'staging_streamspub');
GRANT CREATE DATABASE LINK TO staging_streamspub;

Here the staging database DBA creates a user, staging_streamspub, grants permissions and also creates change sets and change tables on the staging database.

Step 6 — Source Database Publisher: Create Database Link

CREATE DATABASE LINK staging_db
CONNECT TO staging_streamspub IDENTIFIED BY Stg406V5
USING 'staging_db'

In this example, you, the source database publisher, are creating a link from the source database to the staging database. 

Step 7 — Staging Database Publisher: Create Database Link

Create a link from the staging database to the source database. 

CREATE DATABASE LINK source_db
CONNECT TO source_streamspub IDENTIFIED BY Lvh256B2
USING 'source_db';

Step 8 — Staging Database Publisher: Identify the Change Source Database and Create the Change Sources

The following command creates the Distributed HotLog change sources on the source database. 

BEGIN
   DBMS_CDC_PUBLISH.CREATE_HOTLOG_CHANGE_SOURCE(
       change_source_name => 'MAILING_LIST',
       description => 'test source',
       source_database => 'source_db');
END;
/

Step 9 — Staging Database Publisher: Create Change Sets

The following command creates a change set called DAILY_MAILING_LIST that captures changes starting today and continues capturing change data indefinitely.

BEGIN
   DBMS_CDC_PUBLISH.CREATE_CHANGE_SET(
       change_set_name => 'DAILY_MAILING_LIST',
       description => 'change set for contact info',
       change_source_name => 'MAILING_LIST',
       stop_on_ddl => 'y');
END;
/

Step 10 — Staging Database Publisher: Create the Change Tables

The following command creates a change table on the staging database that captures changes made to the source table (sh.contacts) in the source database.

BEGIN
   DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE(
       owner             => 'staging_streamspub',
       change_table_name => 'contacts_ct',
       change_set_name   => 'DAILY_MAILING_LIST',
       source_schema     => 'SH',
       source_table      => 'CONTACTS',
       column_type_list  => 'PHONE_NUMBER NUMBER(10), CONTACT_NAME VARCHAR2(50),
          ZIP_CODE NUMBER(5),
          EMAIL_ADDRESS VARCHAR2(15), HOME_ADDRESS VARCHAR(20)',
       capture_values    => 'both',
       rs_id             => 'y',
       row_id            => 'n',
       user_id           => 'n',
       timestamp         => 'n',
       object_id         => 'n',
       source_colmap     => 'n',
       target_colmap     => 'y',
       options_string    => 'TABLESPACE TS_DAILY_MAILING_LIST');
END;
/

Step 11 — Staging Database Publisher: Enable the Change Source

Alter the change source to enable distributed HotLogs. 

BEGIN
  DBMS_CDC_PUBLISH.ALTER_HOTLOG_CHANGE_SOURCE(
    change_source_name => 'DAILY_MAILING_LIST',
    enable_source      => 'Y');
END;
/

Now, alter the change source to enable asynchronous change sets. 

BEGIN
   DBMS_CDC_PUBLISH.ALTER_CHANGE_SET(
       change_set_name => 'DAILY_MAILING_LIST',
       enable_capture  => 'y');
END;
/

Step 12 — Staging Database Publisher: Grant Access to Subscribers

The publisher should grant a subscriber access to change data so that they can create subscriptions. The following command assumes that the user subscriber1 already exists.

GRANT SELECT ON staging_cdcpub.products_ct TO subscriber1;

Conclusion

You’ve learnt how to capture change data from one Oracle Database into another using Oracle Streams. You learnt how to set up redo logging and asynchronous distributed HotLog publishing.

Check out Oracle Streams Concepts and Administration for more information on Oracle Streams.

While Oracle Streams provides a platform for asynchronous CDC for your Oracle database, it is quite effort-intensive and requires in-depth technical expertise in working with Oracle database. If you want to automate the process of CDC in your Oracle database, Hevo Data is the best product for you. It will help you save on engineering bandwidth and costs while seamlessly enabling Oracle CDC without the requirement of any manual intervention.

Give Hevo a try by signing up for a free 14-day trial!

No-code Data Pipeline for your data warehouse