The purpose of this post is to introduce you to Oracle Streams concepts. You will learn about the various components that make up the Oracle Streams technology. Towards the end, you will find practical examples detailing how to implement Oracle Streams CDC in a production environment.
What is Oracle Streams?
Oracle Streams is a replication technology that enables real-time data integration across systems. The Streams feature was introduced in the Oracle database in version 9i. It allows you to track change data from the operational system(s) as they occur and then apply that in a staging database for further processing as part of an Extract/Transform/Load (ETL) process.
The Oracle Streams technology efficiently identifies and captures all data that has been changed during a transaction and who made them. It quickly identifies and processes only the changed data: Not entire tables.
To guarantee transactional consistency for changes across multiple source tables, Oracle Streams leverages mature technologies such as:
- Advanced Queuing
- Log Miner
- Triggers
Oracle assumes complete change management responsibility by applying implicit capture to automatically capture changes. Oracle Streams will capture and distribute database updates, events, and application messages. It will automatically apply updates to destination databases, or pass events and messages to custom procedures and applications.
Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo’s automated, No-code platform empowers you with everything you need to have for a smooth Oracle data replication experience.
Check out what makes Hevo amazing:
- Live Monitoring & Support: Hevo provides live data flow monitoring and 24/5 customer support via chat, email, and calls.
- Secure & Reliable: Hevo’s fault-tolerant architecture ensures secure, consistent data handling with zero loss and automatic schema management.
- User-Friendly & Scalable: Hevo’s simple UI makes it easy for new users, while its horizontal scaling manages growing data volumes with minimal latency.
Get Started with Hevo for Free
Oracle Streams vs Traditional CDC
- The major difference between Oracle Streams and traditional Change Data Capture is that CDC captures inserts, updates, and deletes made to tables while on the other hand Streams goes even further by capturing event capture, transport (transformation) and consumption.
- This log-based Stream CDC happens synchronously or asynchronously and therefore it does not impact database performance.
Capture Configurations in Oracle Streams
There are 3 different change sets that guarantee transactionally consistent operations:
1. Synchronous Mode
- Synchronous Capture uses triggers on the source database to capture change data. It has no latency because the change data is captured continuously and in real-time on the source database. The change tables are populated when DML operations on the source table are committed. Hence it is titled “Synchronous” Change Data Capture.
- New change data arrives automatically as DML operations on the source tables are committed. Publishers can define new change sets in the predefined SYNC_SOURCE change source or use the predefined change set, SYNC_SET. The SYNC_SET change set cannot be altered or dropped.
- While the “invasive” nature of the approach of the synchronous mode of Change Data Capture adds overhead to the source database at capture time, this mode can reduce costs (as compared to attempting to extract change data using table differencing or change-value section) by simplifying the extraction of change data.
2. Asynchronous & Distribution HotLog
- In Oracle Database 10g, CDC became asynchronous. Asynchronous capture records change data after the changes have been committed to the source database by using the database redo log files.
- With HotLog asynchronous capture, a process on the source database picks up new transactions from a redo log file. There is a brief latency between the act of committing source table transactions and the arrival of change data.
New change data arrives automatically, on a transaction-by-transaction basis from the current online redo log file. Publishers define change sets in the predefined HOTLOG_SOURCE change source.
3. Asynchronous AutoLog
New change data arrives automatically, on a log-by-log basis, as log transport services make redo log files available. Redo transport services control the automated transfer of redo log files from the source database to the staging database. Publishers define change sets in publisher-defined AutoLog change sources.
Mode | Staging Location | Latency | Performance Impact |
Synchronous | Must be the same as the Source Database | None | Adds overhead to Source Database transactions |
Asynchronous HotLog | Must be the same as the Source Database | Populated as new committed transactions arrive | Minimal impact on Source Database for supplemental logging and capture |
Asynchronous Distributed HotLog | Splits between Source and Staging Databases | Populated as new committed transactions arrive | Minimal impact on Source Database for supplemental logging and capture |
Asynchronous AutoLog | Typically remote from the Source Database | Depends on redo log switch frequency | When remote, it has the least impact on performance |
How Synchronous CDC Works
Synchronous CDC uses internal triggers to capture before and/or after images of new and updated rows. The tables in your source database are modified to put triggers on, and the source database transactions will not commit until the changes are propagated over to the target table. It has the same performance implications as capture via user triggers.
How Asynchronous CDC Works
Asynchronous Change Data Capture generates components of Oracle Streams to capture change data and to populate change sets. These components must not be reused or shared for any other purposes. For example, the capture queue that Change Data Capture generates for a Distributed HotLog change source should not be used as the source queue for new user-created Streams propagations.
Messages in a staging area are consumed by the apply engine, where the changes they represent are applied to a database, or they are consumed by an application. The Oracle Streams apply engine works on the following:
The apply engine runs locally on the system hosting the Oracle database. Asynchronous change data capture requires both the source and staging databases to be Oracle Database 10g Enterprise. The source database is where you apply the capture process.
Integrate Oracle to Databricks
Integrate Oracle to BigQuery
Integrate Oracle on Amazon RDS to Snowflake
CDC Publish & Subscribe
The publisher supplies the change data and the subscribers consume change data. This model allows sharing of change data across users and applications.
This mechanism coordinates the retention/purge of change data. It prevents applications from accidentally processing change data more than once.
Oracle Streams Concepts
Before diving into the steps, let’s understand some concepts related to Oracle Streams.
- Capture Process: Catches all real-time data changes in the source database system.
- Change Set: Collection of source tables for which transactionally consistent change data is needed.
- Change Table: It is the container that receives change data. It is published to subscribers.
- Apply Process: Propagates captured data to the system.
- Streams Administrator: It is a DB role used to perform data replication across Oracle databases. It can be assigned any name and privileges must be granted to the Stream admin user.
Steps to Set Up Oracle Streams CDC
For setting up Oracle Streams CDC, you can capture change data asynchronously using:
- HotLog Mode
- AutoLog Mode
In this post, you will learn how to set up Oracle Streams, redo logging, and Change Data Capture for asynchronous distributed HotLog publishing. Here is an illustration of the Stream components of an async distributed HotLog system.
These steps should be performed on both the source database and the staging database by the respective DBAs. This guide assumes that the source database DBA, the staging database DBA, and the publishers are four different people. The guide also assumes that you have Oracle Database release 11g.
Here is a high level overview of the process:
- The source database DBA will:
- Setup Oracle Net Services.
- Initialize database parameters.
- Enable force logging and supplemental logging on the source database.
- Create the source database publisher.
- The staging database DBA will:
- Setup database initialization parameters.
- Create a staging database publisher.
- The source database publisher will then set up a database link from the source database to the staging database.
- The staging database publisher will:
- Set up a database link from the staging database to the source database.
- Create new change sources, change sets, and the change tables.
- Grant all necessary privileges to subscribers.
Let’s go through the steps in detail:
Step 1 — Source Database DBA: Prepare the Source Database
Set up the database initialization parameters on the source database.
compatible = 11.0
global_names = true
job_queue_processes = <current value> + 2
open_links = 4
parallel_max_servers = <current value> + 3
processes = <current value> + 4
sessions = <current value> + 1
streams_pool_size = <current value> + 20 MB
undo_retention = 3600
In this code snippet, you are specifying that the source database is Oracle Database release 11g, the number of change data sources as 1, and the value of the STREAMS_POOL_SIZE parameter to be 50 MB.
Step 2 — Staging Database DBA: Set the Database Initialization Parameters
The staging datbase DBA is going to initialize the database parameters as follows.
compatible = 11.0
global_names = true
java_pool_size = 50000000
open_links = 4
job_queue_processes = 2
parallel_max_servers = <current_value> + 2
processes = <current_value> + 3
sessions = <current value> + 1
streams_pool_size = <current_value> + 11 MB
undo_retention = 3600
Step 3 — Source Database DBA: Alter the Source Database
1. To do this, you first need to place the database in ARCHIVELOG mode by shutting down the database and then restarting it using the following commands:
shutdown immediate
startup mount
alter database archivelog;
alter database open;
2. Place the database into FORCE LOGGING logging mode.
ALTER DATABASE FORCE LOGGING;
3. Enable supplemental logging.
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
4. Create an unconditional log group on all columns to be captured in the source table.
ALTER TABLE sh.contacts ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
In this example, you are capturing all rows in the sh.contacts table. The source database DBA will have to repeat this step for every source table for which change tables should be created.
Step 4 — Source Database DBA: Create and Grant Privileges to the Publisher
CREATE USER source_streamspub IDENTIFIED EXTERNALLY
QUOTA UNLIMITED ON SYSTEM
QUOTA UNLIMITED ON SYSAUX;
GRANT CREATE SESSION TO source_streamspub;
GRANT DBA TO source_streamspub;
GRANT CREATE DATABASE LINK TO source_streamspub;
GRANT EXECUTE on DBMS_CDC_PUBLISH TO source_streamspub;
GRANT EXECUTE_CATALOG_ROLE TO source_streamspub;
GRANT SELECT_CATALOG_ROLE TO source_streamspub;
EXECUTE DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE(
GRANTEE=> 'source_streamspub');
The source database DBA creates a data publisher user (in this case, source_streamspub) and grants privileges to this user so that they have the necessary permissions required to set up a database link from the source database to the staging database.
Step 5 — Staging Database DBA: Create and Grant Privileges to the Publisher
CREATE USER staging_streamspub IDENTIFIED EXTERNALLY
DEFAULT TABLESPACE ts_streamspub
QUOTA UNLIMITED ON SYSTEM;
GRANT CREATE SESSION TO staging_streamspub;
GRANT CREATE TABLE TO staging_streamspub;
GRANT CREATE TABLESPACE TO staging_streamspub;
GRANT UNLIMITED TABLESPACE TO staging_streamspub;
GRANT SELECT_CATALOG_ROLE TO staging_streamspub;
GRANT EXECUTE_CATALOG_ROLE TO staging_streamspub;
GRANT CONNECT, RESOURCE, DBA TO staging_streamspub;
GRANT CREATE SEQUENCE TO staging_streamspub;
EXECUTE DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE(grantee => 'staging_streamspub');
GRANT CREATE DATABASE LINK TO staging_streamspub;
Here the staging database DBA creates a user, staging_streamspub, grants permissions and also creates change sets and change tables on the staging database.
Step 6 — Source Database Publisher: Create Database Link
CREATE DATABASE LINK staging_db
CONNECT TO staging_streamspub IDENTIFIED BY Stg406V5
USING 'staging_db'
In this example, you, the source database publisher, are creating a link from the source database to the staging database.
Integrate your data in minutes!
No credit card required
Step 7 — Staging Database Publisher: Create Database Link
Create a link from the staging database to the source database.
CREATE DATABASE LINK source_db
CONNECT TO source_streamspub IDENTIFIED BY Lvh256B2
USING 'source_db';
Step 8 — Staging Database Publisher: Identify the Change Source Database and Create the Change Sources
The following command creates the Distributed HotLog change sources on the source database.
BEGIN
DBMS_CDC_PUBLISH.CREATE_HOTLOG_CHANGE_SOURCE(
change_source_name => 'MAILING_LIST',
description => 'test source',
source_database => 'source_db');
END;
/
Step 9 — Staging Database Publisher: Create Change Sets
The following command creates a change set called DAILY_MAILING_LIST that captures changes starting today and continues capturing change data indefinitely.
BEGIN
DBMS_CDC_PUBLISH.CREATE_CHANGE_SET(
change_set_name => 'DAILY_MAILING_LIST',
description => 'change set for contact info',
change_source_name => 'MAILING_LIST',
stop_on_ddl => 'y');
END;
/
Step 10 — Staging Database Publisher: Create the Change Tables
The following command creates a change table on the staging database that captures changes made to the source table (sh.contacts) in the source database.
BEGIN
DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE(
owner => 'staging_streamspub',
change_table_name => 'contacts_ct',
change_set_name => 'DAILY_MAILING_LIST',
source_schema => 'SH',
source_table => 'CONTACTS',
column_type_list => 'PHONE_NUMBER NUMBER(10), CONTACT_NAME VARCHAR2(50),
ZIP_CODE NUMBER(5),
EMAIL_ADDRESS VARCHAR2(15), HOME_ADDRESS VARCHAR(20)',
capture_values => 'both',
rs_id => 'y',
row_id => 'n',
user_id => 'n',
timestamp => 'n',
object_id => 'n',
source_colmap => 'n',
target_colmap => 'y',
options_string => 'TABLESPACE TS_DAILY_MAILING_LIST');
END;
/
Step 11 — Staging Database Publisher: Enable the Change Source
Alter the change source to enable distributed HotLogs.
BEGIN
DBMS_CDC_PUBLISH.ALTER_HOTLOG_CHANGE_SOURCE(
change_source_name => 'DAILY_MAILING_LIST',
enable_source => 'Y');
END;
/
Now, alter the change source to enable asynchronous change sets.
BEGIN
DBMS_CDC_PUBLISH.ALTER_CHANGE_SET(
change_set_name => 'DAILY_MAILING_LIST',
enable_capture => 'y');
END;
/
Step 12 — Staging Database Publisher: Grant Access to Subscribers
The publisher should grant a subscriber access to change data so that they can create subscriptions. The following command assumes that the user subscriber1 already exists.
GRANT SELECT ON staging_cdcpub.products_ct TO subscriber1;
Conclusion
You’ve learnt how to capture change data from one Oracle Database into another using Oracle Streams. You learnt how to set up redo logging and asynchronous distributed HotLog publishing.
Check out Oracle Streams Concepts and Administration for more information on Oracle Streams.
While Oracle Streams provides a platform for asynchronous CDC for your Oracle database, it is quite effort-intensive and requires in-depth technical expertise in working with the Oracle database. If you want to automate the process of CDC in your Oracle database, Hevo Data is the best product for you.
It will help you save on engineering bandwidth and costs while seamlessly enabling Oracle CDC without the requirement of any manual intervention.
Try a 14-day free trial and experience the feature-rich Hevo suite firsthand. Also, check out our unbeatable pricing to choose the best plan for your organization.
FAQs
1. What is the difference between Oracle Streams and Oracle GoldenGate?
Oracle Streams is basically used for simple data replication and capture. Oracle GoldenGate, on the other hand has more advanced features, such as bidirectional replication and cross-platform support.
2. How does Oracle Streams CDC work?
It captures changes to data in the database through redo log files that store all transactions and propagates these changes to target systems.
3. Can Oracle Streams CDC be used for heterogeneous environments?
Basically designed for Oracle databases, it can be used to interact with non-Oracle databases as well by integrating with Oracle GoldenGate for higher levels of integration.
Vivek Sinha is a seasoned product leader with over 10 years of expertise in revolutionizing real-time analytics and cloud-native technologies. He specializes in enhancing Apache Pinot, focusing on query processing and data mutability. Vivek is renowned for his strategic vision and ability to deliver cutting-edge solutions that empower businesses to harness the full potential of their data.