How to Load Data from Oracle to Iceberg

Relational databases like Oracle have been the backbone of enterprise data management for years. However, as data volumes grow and the need for flexibility, scalability, and advanced analytics increases, modern solutions like Apache Iceberg are becoming essential. Iceberg’s open architecture and advanced features make it a compelling choice for organizations looking to optimize their data infrastructure.

In this blog, we’ll guide you through two approaches to migrate data from Oracle to Iceberg: manual and automated. We’ll also discuss key factors to consider during migration. Choose the method that best fits your needs, and let’s explore how you can make this transition seamless.

Table of Contents

What is Oracle Server?

Oracle Database, developed by Oracle Corporation, is a leading relational database management system (RDBMS) designed to handle large-scale data storage, processing, and analytics. Known for its scalability and reliability, Oracle Database uses SQL and its proprietary PL/SQL for transaction processing, business intelligence, and data warehousing.

Key Features Of Oracle Server

Highly portable to run on multiple operating systems like Windows, Linux, and UNIX for seamless networking capabilities between applications on different platforms.
With features like Real Application Clusters (RAC) and In-Memory Database Cache (IMDB), Oracle ensures high performance and low latency for both small-scale systems and large distributed databases.
Tools like Oracle RAC, Data Guard, and Multitenant make sure you get continuous database availability to minimize downtime and offer disaster recovery solutions for critical applications.
Advanced Security options such as data encryption, masking, and access control through tools like Database Vault and Label Security to protect sensitive data and comply with industry standards.
Reliable backup and recovery tools, including the Zero Data Loss Recovery Appliance for quick restoration of data in case of failures while minimizing backup time and ensuring data integrity.

What is Apache Iceberg?

Apache Iceberg is an open-source table format designed for managing large-scale analytic datasets in data lakes. Originally developed by Netflix to address the limitations of Apache Hive, Iceberg provides a reliable and efficient way to handle data by enabling advanced features like schema evolution, time travel, ACID transactions and consistent performance across various query engines like Apache Spark, Flink, and Trino.

Key Features of Iceberg

Schema Evolution, that is, you can do seamless changes to table schemas without requiring costly rewrites or migrations, making it easier to adapt to evolving data requirements.
With its snapshot-based architecture, Iceberg enables users to query historical versions of data, useful for auditing, debugging, and reproducing past analysis.
Supports atomicity, consistency, isolation, and durability (ACID) properties, for reliable data operations even in multi-user environments. This prevents partial updates or data corruption during transactions.
Use hidden partitioning to automate and optimize how data is organized, improving query performance without requiring users to manage partitions manually.
Integrates seamlessly with popular big data tools like Apache Spark, Flink, Hive, and Presto, so your organization can use their preferred query engines while maintaining consistent table management.

Why Move Oracle Data to Iceberg?

Open Architecture: Iceberg’s open-source architecture supports multiple query engines and file formats, providing freedom from vendor lock-in and enabling long-term adaptability to changing business needs.
Transactional Capabilities on Data Lakes: Iceberg supports ACID transactions in data lakes, ensuring reliable operations in large-scale distributed environments, making it suitable for applications requiring high data integrity.
Improved Query Performance: Iceberg optimizes query performance with techniques like columnar storage and hidden partitioning, often outperforming Oracle’s traditional indexing methods for analytical workloads.
Lower Storage Costs: Transitioning to Iceberg on a cloud-based data lake reduces storage costs while maintaining analytics capabilities, reduces operational expenses and allocates resources more efficiently.
Seamless Data Collaboration: Iceberg’s versioning and metadata management simplify team collaboration by providing consistent dataset views, reducing errors due to data inconsistencies.

Take a look at 5 Best Replication Tools to automate this integration.

Steps to Migrate from Oracle Server to Apache Iceberg

Prerequisites

Database Version: Oracle 12c or above. Verify with SELECT BANNER_FULL FROM V$VERSION WHERE BANNER_FULL LIKE ‘Oracle Database%’;
Permissions: SELECT permissions granted to the database user.
Redo Logs: Redo Log-based replication is enabled, along with SYSDBA privileges.
Database Name: Valid database name available.

Method 1: Using Hevo

Hevo helps you migrate data from Oracle to Iceberg efficiently. Here’s a simplified approach:

Create a new Oracle user and grant necessary privileges like CREATE SESSION, SELECT ON V_$DATABASE, etc. Make sure to set up Redo Logs by enabling archive log mode, configuring the fast recovery area, enabling supplemental logging, and adjusting PGA settings to prevent OutOfMemory errors.
Once done, you can easily retrieve the database name and use it to configure Oracle as a source in Hevo with required connection details.
Select Oracle as the source.

Navigate and Select Oracle Source:
- Click PIPELINES in the Navigation Bar, then + CREATE PIPELINE.
- On the Select Source Type page, select Edge and then Oracle.

Configure Oracle Source:
- Source Name: Enter a unique name (e.g., Oracle Source).
- Database Details:
  - Host: Enter the Oracle database host’s IP address or DNS (e.g., 192.168.2.5).
  - Port: Use the default port 1521 unless specified otherwise.
  - User: Use a user with read permissions (e.g., hevouser).
  - Password: Enter the user’s password.
  - Database Name: Specify the database from which you want to replicate data (e.g., demo).

Additional Settings and Testing:
- Use SSH: Enable for secure connections via SSH tunnel.
- Use SSL: Enable for encrypted connections and specify the CA file, client certificate, and client key.
- Click TEST & CONTINUE to verify the connection. Once successful, proceed to set up your Destination.
Ensure your Oracle database is configured to accept connections from Hevo’s IP address if not using SSH.

Configure Iceberg as your Destination

Prerequisites

AWS account and IAM user with create IAM policy and add IAM role permissions.
Available S3 bucket.
Available Glue database.
IAM role-based credentials ready.
Glue database and S3 bucket in the same AWS region.

Create IAM Policy:
- In the AWS IAM Console (Access management -> Policies -> Create policy), create an IAM policy.
- Use the JSON policy provided in the original document. Replace placeholders ( <aws_region>, <your_aws_account_id>, <your_glue_database>, <your_bucket_name>) with your values.
- This policy allows Hevo to s3:GetObject, s3:PutObject, glue:CreateTable, glue:DeleteTable, glue:GetTable, glue:GetTables, glue:UpdateTable, and glue:GetDatabase.

Obtain External ID and IAM Role-Based Credentials:
- In Hevo, go to DESTINATIONS -> Edge -> + CREATE EDGE DESTINATION -> Iceberg Data Lake.
- Copy the External ID from the “Connect to your Iceberg” section.
- Now, in the AWS IAM Console (Access management -> Roles -> Create role), create an IAM role.
  - Select Custom trust policy and paste the JSON trust policy (from the original document). Replace External ID assigned by Hevo and AWS arn with your copied External ID from Step 3.
  - Attach the IAM policy created in Step 2.
- Obtain ARN: On the Roles page, select the role and copy its ARN.
Configure Iceberg Data Lake Destination in Edge:
- In Hevo, go to DESTINATIONS -> Edge -> + CREATE EDGE DESTINATION -> Iceberg Data Lake.
- Enter a Destination Name.

In the “Connect to your Iceberg” section:
1. Catalog Type: AWS Glue (default).
2. IAM Role ARN: Paste the ARN from Step 4.
3. Glue Catalog Name: Your Glue database name (from Step 1).
4. Region: AWS region for Glue and S3.
5. Bucket Name: Your S3 bucket name.
6. Path Prefix: Optional folder structure in S3.

Click TEST & SAVE.

Data Mapping and Transformation (if needed):
- Hevo provides options for data transformation. If your Oracle schema doesn’t directly align with your desired Iceberg schema, you can define transformations within Hevo.

For detailed instructions and the latest information, refer to the Hevo documentation: Oracle to Iceberg migration.

Method 2: Using Custom Code

Prerequisites for Migrating Oracle to Iceberg

Access and Permissions: Ensure admin access to Oracle with permissions for exporting tables. Also, verify access to Spark and Iceberg environments.
Spark and Iceberg Setup: Install Apache Spark configured with Iceberg libraries. Ensure compatibility with your cloud storage system (e.g., S3, HDFS).
Schema and Export Tools: Verify Oracle table schemas align with Parquet/ORC formats used by Iceberg. Install Oracle Data Pump or use custom scripts for data export.

Now let’s examine the step-by-step method for migrating the Oracle to Iceberg.

Step 2.1: Export Data from Oracle
Use Oracle Data Pump to export tables as Parquet files:

bash

expdp user/password DIRECTORY=dp_dir DUMPFILE=table_data.parquet TABLES=your_table

Note: Store the exported files in your data lake (e.g., S3).

Step 2.2: Create Iceberg Table in Spark
Define a new Iceberg table in Spark matching the Oracle schema:

sql

spark.sql("""

CREATE TABLE iceberg.db.your_table (

    id INT,

    name STRING,

    created_at TIMESTAMP

) USING iceberg

PARTITIONED BY (created_at)

""")

Step 2.3: Load Exported Data into Iceberg Table
Use Spark to load Parquet files into the new Iceberg table:

python

spark.read.format("parquet").load("s3://your-bucket/oracle-data.parquet") \

.write.format("iceberg").mode("append") \

.save("iceberg.db.your_table")

Step 2.4: Validate Migration Accuracy
Run queries on the Iceberg table to verify data integrity:

sql

spark.sql("SELECT * FROM iceberg.db.your_table LIMIT 10").show()

Step 2.5: Switch Operations to Iceberg Table
After validation, point all workloads from Oracle to the new Iceberg table.

Limitations of Manual Migration for Oracle to Iceberg

Resource Intensive: Requires significant manual effort and expertise in both Oracle and Spark/Iceberg environments.
Downtime Risks: May involve downtime during migration, especially for large datasets or critical systems.
Lack of Automation: No built-in monitoring or error-handling mechanisms, increasing operational complexity.

Use Cases of Connecting Oracle Server and Iceberg

Data Offloading for Cost Savings: Oracle data can be offloaded to Iceberg in a cloud-based data lake, reducing storage costs while maintaining accessibility for analytics.
ACID Transactions on Data Lakes: Iceberg brings ACID compliance to Oracle data stored in a data lake, ensuring reliability for distributed processing workflows.
Schema Evolution Without Downtime: Iceberg allows schema changes (e.g., adding or renaming columns) without requiring costly migrations or downtime, unlike traditional Oracle setups.
Efficient Data Compaction: Iceberg supports automatic file compaction, optimizing Oracle data stored in a lake for better performance and reduced storage overhead.
Cross-Platform Querying: By integrating Oracle with Iceberg, engineers can query the same dataset using multiple engines like Flink or Presto, enabling flexibility in tool choice and analytics workflows.

You can also read about:

Conclusion

Migrating Oracle Server data to Iceberg helps you unlock the benefits of modern data lake architecture, such as scalability, cost savings, and advanced analytics. You can use manual methods like Spark SQL for granular control or automated tools for a faster, hassle-free migration process. This migration ensures flexibility and better data management.

Hevo offers a team of experts to guide you through every step of your migration journey. However, if you are looking for an automated and zero-hassle solution, sign up for a free trial with Hevo today and experience seamless Oracle-to-Iceberg migration.

Frequently Asked Questions

1. Does Oracle Support Iceberg?

Yes, Oracle supports Iceberg through special drivers that allow querying Iceberg tables stored in cloud storage.

2. How to Migrate Oracle Database from One Server to Another?

Migrate Oracle databases by exporting data using tools like Data Pump, moving it to the new server, and importing it. Tools like RMAN or GoldenGate can also be used.

3. How to Migrate Oracle to Snowflake?

You can move your Oracle Data Snowflake using Hevo in two simple steps. Read through migrating Oracle data to Snowflake for detailed steps.

What Does Apache Iceberg Do?

Apache Iceberg helps manage large datasets in data lakes with features like changing data structures easily and tracking changes over time. It works with tools like Spark and Flink for efficient analysis.

Srishti Data Engineer

An experienced big data engineer, specializing in designing, deploying, and optimizing scalable data pipelines with unmatched expertise in Spark, dbt, Iceberg, Hadoop, and SQL. She presents her thought leadership viewpoints, drawn from extensive hands-on expertise in real-time data processing. She ensures peak performance and data quality, delivering high-performance data solutions that fuel business growth.