Oracle Redo Log Based Replication: How to Set Up Real-time Data Load

on Tutorials, Data Integration • March 29th, 2020 • Write for Hevo

In spite of open source database management systems rising in popularity, Oracle’s proprietary database still remains one of the most preferred alternatives for handling critical enterprise applications. Oracle’s powerful querying layer, rich analytical functions,  enterprise-grade security, and granular access controls are a few of the reasons why enterprises still prefer them. It can be used to run transactional, warehousing or mixed loads. In an Oracle-based enterprise architecture stack, there are many instances where oracle databases have to be replicated in real-time. A typical use case is when a separate database other than the transactional database is used for warehousing. Oracle provides support for such replications by allowing access to its redo logs. Redo logs are pre-allocated files that store all the changes made to an Oracle database. This post will cover the fundamentals of implementing Oracle redo log-based replication.

Understanding Oracle Redo logs

Conceptually, Oracle redo logs are a collection of redo records. Redo records are a group of change vectors. Each change vector denotes a change made to a single database block. These entries are comprehensive enough to reconstruct the state of the database at any point in time, They also help in rolling back the database. 

While operational, redo logs are buffered in a circular queue and persisted to files every time a transaction is complete or the buffer is full. Usually, a transaction is marked as completed only after the redo logs are successfully persisted. All redo records corresponding to a specific transaction is identified using a unique identifier called system change number. 

Methods to Set up Replication Using Oracle Redo logs

Oracle replication using the redo logs can be set up in two ways.

  1. Setting up redo logs and implementing a custom application
  2. Using Hevo Data – An Automated Data Pipeline Platform

This blog discusses the first approach in detail. It also provides an introduction to the second approach so that you can evaluate both options and pick the one that suits you best. 

Setting up Oracle Redo Logs 

Replication using Oracle redo logs involves setting up a staging table that captures every change that happens in the original database and then using a custom application to poll the staging table and insert entries to the target data warehouse. A staging table is used here so that there is minimal interference in the original table that is being used for transaction processing. There are three possible configurations based on which a redo log based staging database can be setup. 

  1. Asynchronous hot-log configuration – In this configuration, the staging database is located in the same hardware instance as the source database. This is the simplest to set up and has the lowest latency, but exerts the maximum load on the source database.
  2. Asynchronous distributed hot-log configuration – This differs from the above configuration because of the fact that the staging database is in a separate hardware instance and changes captured from the redo logs are transported to the staging databases through an existing database connection between the two databases.
  3. Autolog configuration – In this configuration, a standby redo log is configured in the staging database which tracks the changes in the source database. Change-sets to the staging database are inserted based on the standby redo log in the staging database. This has the highest latency and is the most complicated to set up, but works completely independent of the source database without exerting any load on it. 

In this post, we will deal with setting up an asynchronous hot log configuration to replicate an Oracle database to an external data warehouse.

The first step is to enable logging in the source database. 

  1. Use the below command to configure the database to operate in FORCE LOGGING mode.

     

    ALTER DATABASE FORCE LOGGING;
  2. Enable logging to capture the UPDATE statements. 
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
  3. Use the below command to enable logging for all columns of the concerned table.

     

    ALTER TABLE ts.products_data ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
  4. Use the PREPARE_TABLE_INSTANTIATION procedure to prepare the source tables for log data capture.

     

    BEGIN
    
    DBMS_CAPTURE_ADM.PREPARE_TABLE_INSTANTIATION(TABLE_NAME => 'ts.products_data');
    
    END;
  5. Create a change set that will track the logs originating at the default HOTLOG_SOURCE. This log source is a predefined one provided by Oracle.

     

    BEGIN
    
           DBMS_CDC_PUBLISH.CREATE_CHANGE_SET(
    
           change_set_name => 'PRODUCT_DATA_SET',
    
           description => 'Change data set for product info',
    
           change_source_name => 'HOTLOG_SOURCE',
    
           stop_on_ddl => 'y',
    
           begin_date => sysdate,
    
           end_date => sysdate+5);
    
    END;
  6. Create a change table where all the change information will be loaded as individual rows.

     

    BEGIN
    
       DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE(
    
       owner              => 'cdcpub',
    
       change_table_name  => 'products_data_ct', 
    
       change_set_name    => 'PRODUCT_DATA_SET',
    
       source_schema      => 'TS',
    
       source_table       => 'PRODUCTS_DATA',
    
       column_type_list   => 'ID NUMBER(6), NAME VARCHAR2(50),
    
          PRICE NUMBER(8,2)',
    
       capture_values     => 'both',
    
       rs_id              => 'y',
    
       row_id             => 'n',
    
       user_id            => 'n',
    
       timestamp          => 'n',
    
       object_id          => 'n',
    
       source_colmap      => 'n',
    
       target_colmap      => 'y',
    
       options_string     => 'TABLESPACE TS_PRODUCT_DATA_SET');
    
    END;
  7. Use the ALTER_CHANGE_SET procedure to enable the change set.

     

    BEGIN
    
       DBMS_CDC_PUBLISH.ALTER_CHANGE_SET(
    
          change_set_name => 'PRODUCT_DATA_SET',
    
          enable_capture => 'y');
    
    END;
  8. To read the change data, a subscription to change data must be created.

     

    BEGIN
    
           DBMS_CDC_SUBSCRIBE.CREATE_SUBSCRIPTION(
    
           change_set_name   => 'PRODUCT_DATA_SET',
    
           description       => 'Change data for PRODUCTS',
    
           subscription_name => 'PRODUCT_DATA_SUB');
    
    END;
  9. Use the SUBSCRIBE procedure to use the subscription that was created in the previous step.

     

    BEGIN
    
           DBMS_CDC_SUBSCRIBE.SUBSCRIBE(
    
           subscription_name => 'PRODUCT_DATA_SUB',
    
           source_schema     => 'SH',
    
           source_table      => 'PRODUCTS',
    
           column_list       => 'ID, NAME,PRICE',
    
           subscriber_view   => 'PRODUCT_DATA_VIEW');
    
    END;
  10. Use the below command to activate the subscription.

     

    BEGIN
    
       DBMS_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION(
    
           subscription_name => 'PRODUCT_DATA_SUB');
    
    END;
  11. Setup a collection window to process all the changes added from the last subscription.

     

    BEGIN
    
       DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW(
    
           subscription_name => 'PRODUCT_DATA_SUB');
    
    END;

    The above command acts as a period polling job, that is this command results in new change data that was created since the last call of this command to be added to the subscription window.

  12. We can now use a select query to read all the changes from the view we created in step 9.
    SELECT ID, NAME, PRICE FROM PRODUCT_DATA_VIEW;

    This step is to be made part of an automated script that keeps using the select statements to gather all the changes and then use the appropriate method for the target data warehouse to insert the data.

Now that was a long list of steps to set up a Oracle redo log-based replication. But the job is not over yet. The developer still needs to have a custom script according to the to insert the data into a target data warehouse or database. Also, for a replication, an initial data dump must be taken from the source database and loaded to the target database before the automated process can start capturing the changes. Let’s now look into the typical challenges that come in the way of a developer set out to implement this replication process.

Limitations of setting up Oracle redo log-based replication manually

  1. We did not consider any configuration changes related to memory management and optimization here. In reality, there is an umpteen number of configuration changes that need to be done to run this process efficiently.
  2. The engineers still need to implement a custom script to use the change data, process them and insert into their target databases.
  3. This implementation is best handled by an engineer with expert knowledge of oracle administration, target database, and programming skills. This means there is a steep learning curve before creating a clean execution. 

An alternative to going through all these hardships will be to use a cloud-based data automated pipeline tool like Hevo which can execute such replications in a matter of a few clicks. 

An Easier Approach – Oracle Redo Log Based Replication Using Hevo

Hevo is a completely self-serve, fully managed, automated data pipeline that can help you implement Oracle redo log-based replication without having to write any code. Hevo’s point and click interface ensure the lowest time to production possible.

Here’s how simple it to set up Oracle Data Replication with Hevo:

  1. Authenticate and connect your Oracle data source
  2. Select the replication mode as “Redo log-based replication”
  3. Point to the destination data warehouse or database where you want to move data

Hevo supports Oracle change data capture out of the box and will ensure that your data is loaded from Oracle to your target database or data warehouse in real-time.

Hevo provides a greatly simplified user interface that can set up the above replication in a matter of a few clicks without the need for the developer to be an Oracle expert. 

Additionally, Hevo helps you clean, transform and enrich data both before and after replicating the database, ensuring that you have analysis-ready data in your warehouse at any point.

Experience the power and simplicity of implementing Oracle redo log based replication by signing up for a 14-day free trial with Hevo.

What are your thoughts about setting up Redo log-based replication? Let us know in the comments?

No-Code Data Pipeline for Oracle