Oracle is a popular on-premise relational database system suited for a wide range of workloads. It can handle transactions, data warehouse loads, and mixed loads. Even though it is designed as an on-premise solution, a recent offering from Oracle also includes a completely managed service. Oracle’s comprehensive querying layer makes it a great analysis tool and a favorite in ETL pipelines. It also comes with a suite of applications to help organizations in data integration, replication, and data transformation. Oracle’s stability and short recovery times make it a great choice for running critical enterprise transactional loads.
A common use case in the ETL scenario is the need for Oracle database replication to a separate data warehouse. This is done so that the data from Oracle can be combined with the data coming from other sources, to extract deeper analysis results. In this post, you will learn about the different methods available for Oracle database replication to other data warehouses.
Let’s see how this blog is structured:
- Understanding Oracle Database Replication
- Method 1: Using A Full Backup And Load Approach
- Method 2: Using A Trigger-Based Approach
- Method 3: Using Hevo Data
- Method 4: Using Oracle Golden Gate CDC
- Method 5: Using Custom Script-Based On Binary Log
Understanding Oracle Database Replication
Oracle provides its own set of tools to replicate Oracle and integrate it with other databases. In this post, you will explore the tools provided by Oracle as well as open-source tools that can be used for the Oracle database replication by implementing custom code.
Methods For Oracle Database Replication
Method 1: Using A Full Backup And Load Approach
This method takes a full dump of the table that you want to replicate in specific intervals of time to generate a snapshot in the preferred format. You can then use this dump file to load another table in a different data warehouse or execute some custom transformations. With Oracle, this can be done easily by using /*csv*/ annotation with the query.
An example of such a dump file generation can be done in SQL Plus as follows:
SPOOL "/data/emp.csv" SELECT /*csv*/ * FROM EMP; SPOOL OFF
The above command will generate a CSV file, which can then be loaded into the target database.
As you may have guessed already, the problem with this approach is that it can only be used with smaller tables. But this is a very reliable method for smaller dimension tables in an ETL.
An alternative to this approach is to load the target database incrementally with only the changes from the last load. To execute such a maneuver, you need to have access to a second Oracle table. Oracle provides a very convenient method to take the difference between two tables by using the following command.
SELECT * FROM new_version MINUS SELECT * FROM old_version;
The above command will provide you with all the inserts and updates in the new version of the table. To execute this correctly, an old version of the table must be replaced with the new version every time data is loaded into the target database.
As already evident, even with the incremental load approach, the load process is inefficient. It is because, at some point in the pipeline, a table is being completely replaced by the new version of the table. This is usually done at predefined intervals.
Summarizing, the above method has the following drawbacks:
- It can only be used with smaller dimension tables.
- It is very inefficient because of the complete overwrite of the involved tables.
- Real-time replication is not feasible with this approach. Only a predefined interval-based load is possible.
- There is a significant performance hit to the source database while the data file generation is in progress. This can be worked around by configuring a multi-master replication though.
Method 2: Using A Trigger-Based Approach
A trigger is a function that can be configured to execute when specific events occur. Triggers can be configured, such that when a source table is modified, a record is inserted into another table. This table can then be used to replicate Oracle data to a target database by using some custom code. The trigger-based approach is a synchronous process because the transactions in the source database will not get complete unless the triggers are successfully executed. It means that there is a performance hit in this case also. But using a multi-master replicated instance of Oracle as the source database can avoid this problem.
Oracle provides a built-in stored procedure to configure the trigger and watch the source table for updates. The crux of this approach is the following command that configures a sync_source from where a ‘change table’ can read changes.
BEGIN DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE( owner => 'cdcpub', change_table_name => 'employee_change_set', change_set_name => 'EMP_DAILY', source_schema => 'SH', source_table => 'EMPLOYEE', column_type_list => 'EMP_ID NUMBER(6), EMP_NAME VARCHAR2(50), EMP_AGE NUMBER(8,2)', capture_values => 'both', rs_id => 'y', row_id => 'n', user_id => 'n', timestamp => 'n', object_id => 'n', source_colmap => 'y', target_colmap => 'y', options_string => 'TABLESPACE TS_EMP_DAILY'); END;
The above command creates a changeset called ‘employee_change_set’ based on the table EMPLOYEE. More details on this approach can be found on this page.
Even though it solves the lack of real-time support problems in the first approach, this method still has some drawbacks.
- There is a performance hit in the source database since triggers can delay the completion of a transaction.
- For inserting into a target database, there is still significant work involved in building a custom script to subscribe to the changeset table and write accordingly.
Method 3: Using Hevo Data
Hevo is a No-code Data Pipeline. It helps you achieve Oracle database replication in real-time. It offers a fully-automated service and also ensures that your data is secure and consistent.
To achieve Oracle database replication using Hevo, follow these steps:
- Authenticate and connect your Oracle source.
- Configure your destination and start your data replication.
Let’s see some key highlights of Hevo Data:
- Easy Setup And Highly Intuitive User Interface: Hevo has a minimal learning curve and can be set up in minutes. Once the user has quickly configured and connected both the data source and the destination data warehouse, Hevo moves data in real-time.
- Fully Managed: Neither coding nor pipeline maintenance is required by your team.
- Unlimited Integrations: Hevo can provide connectivity to numerous cloud-based and on-site assets. Check out the complete list here.
- Automatic Schema Mapping: Hevo automatically detects the schema of the incoming data and maps it to the destination schema. This feature makes you free from the tedious job of manually configuring schema.
- Effortless Data Transformations: Hevo provides a simple python interface to clean, transform, and enrich any data before moving it to the data warehouse. Read more on Hevo’s transformation here.
Give Hevo a try by signing up for a 14-day free trial today.
Method 4: Using Oracle Golden Gate CDC
Oracle Golden Gate is Oracle’s solution for all things related to ETL and replication with Oracle as the source database. Golden Gate is a paid product from Oracle that can connect to several target databases through built-in handlers. Golden Gate’s Kafka handler makes it easy for organizations to set up custom pipelines in case their preferred target database handler is not present in the Golden Gate.
Golden gate works based on binary log-based replication. Oracle binary logs are a historical record of all the changes that have happened in a database since its creation. Watching the binary logs can provide you with events related to all the changes in an Oracle database. Golden Gate’s biggest strength is its true real-time support and a large number of target databases through its built-in handlers. It also supports the transformation of data while loading it to the target database.
Drawbacks of this approach are listed below:
- Golden Gate configuration is a complex process and needs an expert Oracle admin to get it right. A detailed post on how to set this up can be found here.
- The initial load of the target database is still a problem because Golden Gate is notorious for its slow first-time load. Most database administrators still prefer using the traditional dump and export method for the first load.
Method 5: Using Custom Script-Based On Binary Log
As mentioned in the previous section, Oracle binary log is a complete record of all the changes that have taken place in an Oracle table from the beginning. It means any custom code that can read this binary log can aid in creating a replication script. Fortunately, you don’t have to implement such binary log mechanisms. There are some open-source tools available that can read these binary log events and expose them for consumption.
One such library is the kafka-connect-oracle hosted here. This utility reads binary logs and adds the event into a Kafka stream. To implement a custom script, you only need to process the Kafka stream events and write accordingly to the target database. The biggest advantage of using such a utility is that it spares you from spending money on the Golden Gate license. For the same reason, Oracle actively discourages you from using such utilities and may even remove the features that are used by such libraries in future releases.
This method may seem like the perfect solution, but like everything else in life, there are certain drawbacks to this approach too.
- It does not support DDL operations, and so events related to new table creation and deletion are not captured.
- It is not an officially supported solution, so in case of a problem, while setting this up, you are at the mercy of the open-source community. It may be hard to digest for organizations that are used to Oracle’s enterprise-grade customer support.
As evident from the above discussion, no approach is perfect, and the decision depends a lot on the on-ground scenarios in an organization and the compromises that you are willing to make in your criteria. If you are tired of making compromises and want a simple, easy-to-use, and completely managed Oracle database replication experience, you should have a look at what Hevo offers.
Sign up for a 14-day free trial and experience efficient and effective ETL.
Share your experience on working with Oracle database replication. Get in touch with us in the comment section below.