Press "Enter" to skip to content

Oracle to Snowflake ETL: Steps to Move Data Easily

Oracle to Snowflake ETL

Oracle databases do not need an introduction. They have ruled the data landscape for decades. Many large organizations heavily depend on Oracle to run mission-critical workloads. While Oracle may have many advantages as a database, it is imperative to move data from Oracle to Snowflake Data Warehouse for the following reasons: 

    • Snowflake promises high computational power. In case there are many concurrent users running complex queries, the computational power of Snowflake instance can be changed dynamically. This ensures that there is less waiting time for complex query executions.
    • The agility and elasticity offered by Snowflake Cloud Data warehouse solution is unmatched. This gives you the liberty to scale only when you needed and pay for what you use. 
    • Snowflake is a completely managed service. This means you can get your analytics projects running with minimal engineering resources. 
    • Snowflake gives you the liberty to work seamlessly with Semi-structured data. Analysing this in Oracle is super hard. 

Methods to move data from Oracle to Snowflake:

Here are two ways that can be used to approach Oracle to Snowflake ETL:

Method 1: Implement a hassle-free, no-code Data Integration Platform like Hevo Data – 14 Day Free Trial (Official Snowflake ETL Partner)

Method 2: Write custom ETL scripts to move data from Oracle to Snowflake

Through this article, you will get a step-by-step understanding of how to write custom code to move data from Oracle to Snowflake. Towards the end of the blog, the limitations of this approach and the means to navigate through them are highlighted.

Move data from Oracle to Snowflake Using Custom Code: 

Steps to move data from Oracle to Snowflake can be categorised as follows:

  1. Extract data from Oracle to CSV using SQL*Plus 
  2. Data type conversion and other transformations
  3. Staging files to S3
  4. Finally, copy staged files to the Snowflake table

Let us go through these steps in detail.

Step 1: Extract data from Oracle to CSV file

SQL*Plus is a query tool installed with every Oracle Database Server or Client installation. It can be used to query and redirect result of an SQL query to a CSV file. The command used for this is: Spool

Eg : 

-- Turn on the spool

    spool spool_file.txt

 -- Run your Query

    select  * from dba_table;

 -- Turn of spooling

    spool off;

Once Spool is off, all the results are written to the file mentioned in the command (in this case spool_file.txt) until the spool is turned off.

  • Spool file will not be visible until the command is turned off
  •  If Spool file doesn’t exist already, a new file will be created. If it exists, it will be overwritten by default. There is an append option from Oracle 10g which can be used to append to an existing file.

Most of the time the data extraction logic will be executed in a Shell script. Here is a very basic example script to extract full data from an Oracle table

#!/usr/bin/bash

FILE="students.csv"


sqlplus -s user_name/password@oracle_db  <<EOF


SET PAGESIZE 35000

SET COLSEP "|"

SET LINESIZE 230

SET FEEDBACK OFF


SPOOL $FILE

SELECT * FROM EMP;


SPOOL OFF

EXIT

EOF#!/usr/bin/bash


FILE="emp.csv"


sqlplus -s scott/tiger@XE  <<EOF


SET PAGESIZE 50000

SET COLSEP ","

SET LINESIZE 200

SET FEEDBACK OFF


SPOOL $FILE


SELECT * FROM STUDENTS;


SPOOL OFF

EXIT

EOF
  • SET PAGESIZE –  The number of lines per page. The header line will be there on every page. 
  • SET COLSEP  – Setting the column separator.
  • SET LINESIZE  – The number of characters per line. The default is 80. You can set this to a value in a way that entire record to comes within a single line.
  • SET FEEDBACK OFF – In order to prevent logs appearing in the CSV file, the feedback is put off.
  • SPOOL $FILE – The filename where you want to write the results of the query.
  • SELECT * FROM STUDENTS – The query to be executed to extract data from the table.
  • SPOOL OFF – To stop writing the contents of the SQL session to the file.

Incremental Data Extract

As discussed in the above section, once Spool is on, any SQL can be run and the result will be redirected to the specified file. To extract data incrementally, you need to generate SQL with proper conditions to select only records which are modified after the last data pull.

Eg:

select * from students where last_modified_time > last_pull_time and last_modified_time <= sys_time.

Now the result set will have only changed records after last pull. 

2. Data type conversion and formatting

While transferring data from Oracle to Snowflake, data might have to be transformed as per business needs. Apart from such use case specific changes, there are certain important things to be noted for smooth data movement.

  • Many errors can be caused by character sets mismatch in source and target. Note that Snowflake supports all major character sets including UTF-8 and UTF-16. The full list can be found here.
  • While moving data from Oracle to Big Data systems most of the time data integrity might be compromised due to lack of support for SQL constraints. Fortunately, Snowflake supports all SQL constraints like UNIQUE, PRIMARY KEY, FOREIGN KEY, NOT NULL constraints which is a great help for making sure data has moved as expected.
  • Snowflake’s type system covers most primitive and advanced data types which include nested data structures like struct and array. Below is the table with information on Oracle data types and the corresponding Snowflake counterparts.
Oracle Data Type Snowflake Data Type
NUMBER BYTEINT
NUMBER SMALLINT
NUMBER INTEGER
NUMBER BIGINT
NUMBER DECIMAL
FLOAT FLOAT
NUMBER NUMERIC
FLOAT CHAR(Up to 16MB)
VARCHAR2/NVARCHAR2 VARCHAR(Up to 16MB)
CHAR(n) CHAR VARYING(n)
FLOAT REAL
DATE DATE
TIMESTAMP(Only HH:MI:SS) TIME
TIMESTAMP with TIMEZONE TIMESTAMP_LTZ
Aliases:
TIMESTAMPLTZ,
TIMESTAMP WITH LOCAL TIME ZONE
BINARY_FLOAT/BINARY_DOUBLE BINARY(Up to 8M)
BINARY_FLOAT/BINARY_DOUBLE VARBINARY
CLOB / VARCHAR2 VARIANT
CLOB / VARCHAR2 ARRAY
  • Often, date and time formats require a lot of attention while creating data pipelines. Snowflake is quite flexible here as well. If a custom format is used for dates or time in the file to be inserted to the table, this can be explicitly specified using “File Format Option”. The complete list of date and time formats can be found here.

Step 3: Stage Data Files

To load data to Snowflake, it has to be uploaded to a cloud staging area first. If you have your Snowflake instance running on AWS, then the data has to be uploaded to an S3 location that Snowflake has access to. This process is called staging. Snowflake stage can be either internal or external.

Internal Stage

If you chose to go with this option, each user and table will be automatically assigned to an internal stage which can be used to stage data related to that user or table. Internal stages can be even created explicitly with a name.

  • For a user, the default internal stage will be named as ‘@~’.
  • For a table, the default internal stage will have the same name as the table.
  • There is no option to alter or drop an internal default stage associated with a user or table.
  • Unlike named stages file format options cannot be set to default user or table stages.

If an internal stage is created explicitly by the user using  SQL statements with a name, many data loading options can be assigned to the stage like file format, date format, etc. When data is loaded to a table through this stage those options are automatically applied.

Note: The rest of this document discusses many Snowflake commands. Snowflake comes with a very intuitive and stable web-based interface to run SQL and commands. However, if you prefer to work with a light-weight command line utility to interact with the database you might like SnowSQL – a CLI client available in Linux/Mac/Windows to run Snowflake commands. Read more about the tool and options here.

Now let’s have a look at commands to create stage:

Create a named internal stage my_oracle_stage  and assign some default options:

create or replace stage my_oracle_stage

copy_options= (on_error='skip_file')

file_format= (type = 'CSV' field_delimiter = ',' skip_header = 1);

PUT is the command used to stage files to an internal Snowflake stage. The syntax of the PUT command is :

PUT file://path_to_your_file/your_filename internal_stage_name

Eg:

Upload a file items_data.csv in the /tmp/oracle_data/data/ directory to an internal stage named oracle_stage

put file:////tmp/oracle_data/data/items_data.csv @oracle_stage;

While uploading the file you can set many configurations to enhance the data load performance like the number of parallelisms, automatic compression, etc. Complete information can be found here

External Stage

Let us now look at the external staging option and understand how it differs from the internal stage. Snowflake supports any accessible Amazon S3 or Microsoft Azure as an external staging location. You can create a stage to pointing to the location data can be loaded directly to the Snowflake table through that stage. No need to move the data to an internal stage.

If you want to create an external stage pointing to an S3 location, IAM credentials with proper access permissions are required. If data needs to be decrypted before loading to Snowflake, proper keys are to be provided. Here is  an example to create an external stage:

create or replace stage oracle_ext_stage url='s3://snowflake_oracle/data/load/files/'

credentials=(aws_key_id='1d318jnsonmb5#dgd4rrb3c' aws_secret_key='aii998nnrcd4kx5y6z');

encryption=(master_key = 'eSxX0jzskjl22bNaaaDuOaO8=');

Once data is extracted from Oracle it can be uploaded to S3 using the direct upload option or using AWS SDK in your favourite programming language. Python’s boto3 is a popular one used under such circumstances. Once data is in S3, an external stage can be created to point that location. 

Step 4: Copy staged files to Snowflake table

So far – you have extracted data from Oracle, uploaded it to an S3 location and created an external Snowflake stage pointing to that location. The next step is to copy data to the table. The command used to do this is COPY INTO. Note: To execute the COPY INTO command, compute resources in Snowflake virtual warehouses are required and your Snowflake credits will be utilised.

Eg:

To load from a named internal stage

copy into oracle_table

from @oracle_stage;

Loading from the external stage. Only one file is specified.

copy into my_ext_stage_table

from @oracle_ext_stage/tutorials/dataloading/items_ext.csv;

You can even copy directly from an external location without creating a stage:

copy into oracle_table

from s3://mybucket/oracle_snow/data/files

credentials=(aws_key_id='$AWS_ACCESS_KEY_ID' aws_secret_key='$AWS_SECRET_ACCESS_KEY')

encryption=(master_key = 'eSxX009jhh76jkIuLPH5r4BD09wOaO8=')

file_format = (format_name = csv_format);

Files can be specified using patterns

copy into oracle_pattern_table

from @oracle_stage

file_format = (type = 'TSV')

pattern='.*/.*/.*[.]csv[.]gz';

Some commonly used options for CSV file loading using the COPY command

  • DATE_FORMAT – Specify any custom date format you used in the file so that Snowflake can parse it properly.
  • TIME_FORMAT – Specify any custom date format you used in the file.
  • COMPRESSION –  If your data is compressed, specify algorithms used to compress. 
  • RECORD_DELIMITER –  To mention lines separator character.
  • FIELD_DELIMITER – To indicate the character separating fields in the file.
  • SKIP_HEADER – This is the number of header lines to skipped while inserting data to table.

There are many other useful options. For full list click here. 

Update Snowflake Table

We have discussed how to extract data incrementally from oracle table. Once data is extracted incrementally, it cannot be inserted to the target table directly. There will be new and updated records which have to be treated accordingly.

Earlier in this document, we mentioned that Snowflake supports SQL constraints. Adding to that, another surprising feature from Snowflake is support for row-level data manipulations which makes easier to handle delta data load. The basic idea is to load incrementally extracted data into an intermediate or temporary table and modify records in the final table with data in the intermediate table. The three methods mentioned below are generally used for this.

  1. Update the rows in the target table with new data (with same keys). Then insert new rows from the intermediate or landing table which are not in the final table.
    UPDATE oracle_target_table t 
    SET t.value = s.value 
    FROM  landing_delta_table in 
    WHERE t.id = in.id;
    INSERT INTO oracle_target_table (id, value)
    SELECT id, value
    FROM  landing_delta_table
    WHERE NOT id IN (SELECT id FROM oracle_target_table);
  2. Delete rows from the target table which are also in the landing table. Then insert all rows from the landing table to the final table. Now, the final table will have the latest data without duplicates
    DELETE .oracle_target_table f 
    WHERE f.id IN (SELECT id from landing_table); 
    INSERT oracle_target_table (id, value) 
    SELECT id, value FROM  landing_table;
  3. MERGE Statement – Standard SQL merge statement which combines Inserts and updates. It is used to apply changes in the landing table to the target table with one SQL statement
    MERGE into oracle_target_table t1 using landing_delta_table t2 on t1.id = t2.id
    
    WHEN matched then update set value = t2.value
    
    WHEN not matched then INSERT (id, value) values (t2.id, t2.value);

This method works when you have a comfortable project timeline and a pool of experienced engineering resources that can build and maintain the pipeline. However, the method mentioned above comes with a lot of coding and maintenance overhead.

There are much easier and simpler ways to achieve this. 

 EASY WAY TO MOVE DATA FROM ORACLE TO SNOWFLAKE

Using a fully managed Data Pipeline Platform such as Hevo, (also an official Snowflake ETL partner) can assist you to move your data from Oracle DB to Snowflake in real-time without writing any code. Hevo automates the entire data migration in a secure and reliable manner. Here are the simple steps to Oracle DB to Snowflake using Hevo:

  • Authenticate and Connect to your Oracle Database
  • Select Replication Mode:
    • Load data through a custom SQL query
    • Select the Oracle DB tables to be loaded via UI
    • Load data using Redo logs
  • Configure the Snowflake data warehouse for data load

With Hevo, you can load data consistently and reliably from Oracle to Snowflake. Hevo’s fault-tolerant architecture ensures zero data loss in the migration process. With Hevo as part of your infrastructure, you can shift your focus on the right projects instead of worrying about the availability of data.

Outside of Oracle, Hevo has over 100s of additional data sources integrations that work out of the box. This provides you with the flexibility to scale your data infrastructure up and down as your business needs evolve.

Sign up for a 14-Day Free Trial with Hevo and experience code-free, hassle-free data loading from Oracle to Snowflake.

ETL Data to Redshift, Bigquery, Snowflake

Move Data from any Source to Warehouse in Real-time

Sign up today to get $500 Free Credits to try Hevo!
Start Free Trial