Loading Data from Oracle to Redshift: 2 Easy Methods

on Tutorial • September 6th, 2021 • Write for Hevo

Is your Oracle server getting too slow for analytical queries now? Or do you think you are paying too much money to increase the storage capacity or compute power of your Oracle instance? Or are you looking to join and combine data from multiple databases seamlessly? Whatever the case may be, Amazon Redshift offers amazing solutions to the above problems.  Hence there is little to think about before moving your data from an Oracle to Amazon Redshift cluster.

This article covers the basic idea behind the two architectures and the detailed steps you need to follow to migrate data from Oracle to Redshift. Additionally, it also covers why you should consider implementing an ETL solution such as Hevo Data to make the migration smooth and efficient.

Table of Contents

Overview on Oracle and Amazon Redshift

Oracle logo
Image Source

Oracle is fundamentally a Proprietary, Multi-Model, Relational Database System used for Data Warehousing and Online Transaction Processing (OLTP). However, the most recent versions include features similar to cloud-based solutions (such as Amazon Redshift) like columnar storage, on-cloud deployment, etc.

Redshift logo
Image Source

Amazon Redshift is a PostgreSQL standard-based, efficiently scalable, entirely managed, on-cloud database optimized for Online Analytical Processing (OLAP) and data warehousing. One can get things started very quickly in just two steps –  

  • Launch a Redshift cluster via simple API calls or through the AWS Management Console.
  • Connect the local SQL client to this Redshift instance.

There are many advantages of Redshift’s Unique Architecture. Deciding to move data from Oracle to Redshift is the right step in stepping up your data analytics infrastructure.

Methods to Load Data from Oracle to Redshift

Method 1: Custom ETL Scripts to Load Data from Oracle to Redshift

Hand code ETL scripts and configure jobs to move Oracle data to Redshift

Method 2: Setting Up Oracle to Redshift Integration using Hevo Data

Hevo Data, a No-code Data Pipeline, provides you a fully automated platform to set up Oracle to Redshift Integration for free. It is a hassle-free solution to directly connect Oracle to Redshift when you don’t have technical expertise in this field.

Sign up here for a 14-day Free Trial!

Methods to Load Data from Oracle to Redshift

Oracle to Redshift Integration image
Image Source

There are majorly 2 methods of loading data from Oracle to Redshift:

Let’s walk through these methods one by one.

Method 1: Custome ETL Scripts to Load Data from Oracle to Redshift

It is a really easy and straightforward way to move data from Oracle to Amazon Redshift. This method involves 4 major steps:

These steps are illustrated in technical detail via an example in the following section.

Step 1: Exporting Data from an Oracle Table via Spool

One of the most common ways to export Oracle data onto a flat-file is using the Spool command. Here’s an example of how to do it –

SPOOL c:oracleorgemp.csv

SELECT employeeno || ',' ||

       employeename || ',' ||

       job || ',' ||

       manager || ',' ||

       TO_CHAR(hiredate,'YYYY-MM-DD') AS hiredate || ',' ||

       salary || ',' ||

FROM   employee

ORDER BY employeeno;

SPOOL OFF

The above code exports all records available in employees into the emp.csv file under the org folder as mentioned. The CSV file could then be zipped (using “$ gzip emp.csv”) for compression before moving to the AWS S3 Bucket.

Step 2: Copying a Flat File onto an AWS S3 Bucket

AWS provides S3 Buckets to store files that could be loaded into an Amazon Redshift instance using the COPY command. To drop a local file into an AWS S3 Bucket, you could run a ‘COPY command’ on the AWS Command Line Interface. Here’s how you would do it –

aws s3 cp //oracle/org/emp.csv s3://org/empl/emp.csv.gz

However, if you’d prefer the Graphical User Interface (GUI) way, you could go over to your AWS S3 console https://console.aws.amazon.com/s3/home, and copy-paste your “emp.csv” file into the desired Amazon S3 Bucket.

Step 3: Creating an Empty Table and Loading Data from the AWS S3 Bucket

Before running the COPY command, an empty table must be created in the database to absorb the “emp.csv” file now available on the Amazon S3 Bucket.

The employee table on Redshift can be created using the following code:

SET SEARCH_PATH TO PUBLIC;       // selecting the schema

CREATE TABLE EMPLOYEE (

  cmployeeno INTEGER NOT NULL, 

  employeename VARCHAR,

  job VARCHAR,

  manager VARCHAR,

  hiredate DATE,  

  salary INTEGER

  DISTKEY(hiredate)

  SORTKEY(employeeno)

)

The flat file copied over to S3 can be loaded into the above table using the following :

SET SEARCH_PATH TO PUBLIC; 

COPY EMPLOYEE

FROM 's3://org/empl/emp.csv.gz'

'AWS_ACCESS_KEY_ID=MY_ACCESS_KEY AWS_SECRET_ACCESS_KEY=MY_SECRET_KEY'

GZIP;

Once you are done with the above steps, you need to increment the load from Oracle to Redshift. So, keep reading!

Incremental Load from Oracle to Redshift

The above is an example to demonstrate the process of moving data from Oracle to Redshift. In reality, this would be performed, typically every day, on an entire database consisting of 10s or 100s of tables in a scheduled and automated fashion. Here is how this is done.

Step 1: Iterative Exporting of Tables

The following script will go through each table one by one. Next, it will export the data in each of them into a separate CSV file with the filename as the *name of the table*_s3.

begin

 for item in (select table_name from user_tables) 

  loop

    dbms_output.put_line('spool '||item.table_name||'_s3.csv');

    dbms_output.put_line('select * from'||item.table_name||’;’);  

    dbms_output.put_line('spool off');

  end loop;

end;

Step 2: Copying CSV Files to AWS S3

The exported .csv files can be uploaded to an S3 bucket using the following command:

aws s3 cp <your directory path> s3://<your bucket name> --grants read=uri=http://acs.amazonaws.com/groups/global/AllUsers --recursive

Step 3: Importing AWS S3 Data into Redshift

As mentioned before, this process is typically done every 24 hours on a whole lot of data. Hence, you must ensure that there is no data loss as well as no duplicate data. The 2nd part (duplicate data) is particularly relevant when copying data over to Redshift because Redshift doesn’t enforce Primary Key constraints.

Now, you can drop all the data in your Redshift instance and load the entire Oracle Database every time you are performing the data load. However, this is quite risky in regards to data loss and also very inefficient and computationally intensive. Hence, a good way to efficiently perform the data loads while ensuring data consistency would be to:

  • Copy the AWS S3 flat file data into a temp table: This is achieved by running the ‘COPY’ command the same way as explained in “Step 3” before.
  • Compare the temp table data with the incoming data (the .csv files): See the section Data Loads: SCD Type 1 and Type
  • Resolve any data inconsistency issues: See the section Data Loads: SCD Type 1 and Type 2
  • Remove data from the Parent Table and copy the new and clean up data from the Temp Table. Run the following commands:
begin;

delete from employee where *condition* (depends on what data is available in the temp table)

insert into employee select * from emp_temp_table;

end;  

Data Loads – SCD Type 1 and Type 2

Generally, while comparing the existing table data with the new stream of data (S3 bucket data, in this case) one or both of the following methods is used to complete the data load:

Type 1 or Upsert: A new record is either inserted or updated. The update happens only when the primary key of the incoming record matches with the primary key of an existing record. Here is an example:

Existing Record:

EmployeenameEmployeenumJob
John123Sales

Incoming Record:

EmployeenameEmployeenumJob
John123Operations

Final Table (After Upsert):

EmployeenameEmployeenumJob
John123Operations

Type 2 or Maintain History: In this scenario, if the primary key of the incoming record matches with the primary key of an existing record, the existing record is end dated or flagged to reflect that it is a past record. Here is the Type 2 for the above example –

Existing Record:

EmployeenameEmployeenumJob
John123Sales

Incoming Record:

EmployeenameEmployeenumJob
John123Operations

Final Table (After Type 2):

EmployeenameEmployeenumJobActiveflag
John123Operations1
John123Sales0

Limitations of Using Custom ETL Scripts to Load Data from Oracle to Redshift

Although a Custom Script (or more likely a combination of Custom Scripts) written to execute the above steps will work, it will be tedious to ensure the smooth functioning of such a system due to the following reasons:

  • There are many different kinds of steps that are needed to be executed in a dependent fashion without failure.
  • The incremental load is especially difficult to code and execute in such a way as to ensure there is no data loss and/or data inconsistencies. Doing a full load every time puts a lot of load on the Oracle database. 
  • As mentioned, this is typically done once every day. Lately, however, people want to look at more real-time data. Hence, this will have to be executed a lot more frequently than once in 24 hours. That is going to test the robustness and thoroughness of your solution a lot more.

Method 2: Setting Up Oracle to Redshift Integration using Hevo Data

Hevo Data image
Image Source

Hevo Data, a No-code Data Pipeline helps to Load Data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 100+ data sources including Oracle, etc., and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo loads the data onto the desired Data Warehouse, enriches the data, and transforms it into an analysis-ready form without writing a single line of code.

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different Business Intelligence (BI) tools as well.

Get Started with Hevo for free

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-day Free Trial!

Conclusion

Furthermore, Hevo has an intuitive user interface that lets, even the not-so-technical people, easily tweak the parameters of your data load settings. This would come in super handy once you have everything up and running.

With Hevo, you can achieve seamless and accurate data replication from Oracle to Redshift. With its fault-tolerant architecture, Hevo ensures that no data is lost while loading. This empowers you to focus on the right projects instead of worrying about data availability.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of loading data from Oracle to Redshift in the comments section below!

No-code Data Pipeline for Redshift