Are you trying to connect Oracle to Snowflake? Have you looked all over the internet to find a solution for it? If yes, then you are in the right place. Snowflake is a fully managed Data Warehouse, whereas Oracle is a modern Database Management System. Loading your data from Oracle to Snowflake provides data-driven insights and solutions.
This article will give you a brief overview of Oracle Database and Snowflake. You will also get to know how you can easily set up Oracle to Snowflake Integration using two methods. In the end, you will be in the position to choose the best of both methods based on your business requirements. Read along to decide which method of connecting Oracle to Snowflake is best for you.
Table of Contents
- Introduction to Oracle Database
- Introduction to Snowflake
- Benefits of Replicating Data from Oracle to Snowflake
- Methods to Set up Oracle to Snowflake Integration
You will have a much easier time understanding the ways for setting up the Oracle to Snowflake Integration if you have gone through the following aspects:
- An active Oracle account.
- An active Snowflake account.
- Working knowledge of Databases and Data Warehouses.
- Working knowledge of Structured Query Language (SQL).
- Clear idea regarding the type of data to be transferred.
Introduction to Oracle Database
Oracle Database is one of the popular Relational Database Management System which is also known as OracleDB or Oracle. Lawrence Ellison and other engineers created it in 1977. It is one of the most widely used Relational Database engines for storing, organizing, and retrieving data in the IT sector.
Oracle was the first database to be built specifically for enterprise grid computing and Data Warehousing. Enterprise grid computing is the most cost-effective and flexible approach to handle data and applications. For interacting with the database, it utilizes SQL queries as a language.
To know more about Oracle Database, visit this link.
Introduction to Snowflake
Snowflake is one of the popular fully managed, Cloud Data Warehouses that is available as a Software-as-a-Service (SaaS) to clients. The term “fully managed” refers to the fact that users will not be responsible for any back-end tasks such as server installation, maintenance, and so on.
Moreover, the customer has the option of choosing which cloud provider to use for their Snowflake instance. This is useful for businesses that use various cloud providers. Snowflake querying adheres to the ANSI SQL standard and supports both fully organized and semi-structured data, such as JSON, Parquet, and XML.
To know more about Snowflake, visit this link.
Benefits of Replicating Data from Oracle to Snowflake
Many business applications are replicating data from Oracle to Snowflake, not only because of the superior scalability but also because of the other advantages that set Snowflake apart from traditional Oracle environments. Some of the benefits of replicating data from Oracle to Snowflake include:
- Snowflake promises high computational power. In case there are many concurrent users running complex queries, the computational power of the Snowflake instance can be changed dynamically. This ensures that there is less waiting time for complex query executions.
- The agility and elasticity offered by the Snowflake Cloud Data warehouse solution are unmatched. This gives you the liberty to scale only when you needed and pay for what you use.
- Snowflake is a completely managed service. This means you can get your analytics projects running with minimal engineering resources.
- Snowflake gives you the liberty to work seamlessly with Semi-structured data. Analyzing this in Oracle is super hard.
Methods to Set up Oracle to Snowflake Integration
This method of connecting Oracle to Snowflake involves the use of SQL*Plus to convert Oracle data into CSV format and then staging the files to S3. Finally, you need to copy the staged files into the Snowflake Data Warehouse. This method requires more effort as well as more engineering bandwidth. Moreover, coding and maintenance overhead is also associated with this method to move data from Oracle to Snowflake.
Hevo Data is an automated Data Pipeline platform that can move your data from Oracle to Snowflake very quickly without writing a single line of code. It is simple, hassle-free, and reliable!
Hevo will automate your data flow from Oracle to Snowflake, saving engineering bandwidth to get utilized elsewhere. Hevo supports data replication using Oracle Redo Log which updates data in your warehouse in real-time without impacting your production database. Moreover, its Fault-Tolerant architecture makes sure that your data is secure and consistent.
Hevo offers a fully-managed solution to set up data integration from Oracle and 100+ data sources (including 40+ free data sources) and will let you directly load data to a Data Warehouse such as Snowflake, Amazon Redshift, Google BigQuery, or a destination of your choice.
Methods to Set up Oracle to Snowflake Integration
There are many ways of loading data from Oracle to Snowflake. In this blog, you will be going to look into two popular ways. Also you can read our article on Snowflake Excel integration.
In the end, you will have a good understanding of each of these two methods. This will help you to make the right decision based on your use case:
- Method 1: Manual ETL Process to Set up Oracle to Snowflake Integration
- Method 2: Using Hevo Data to Set up Oracle to Snowflake Integration
Method 1: Manual ETL Process to Set up Oracle to Snowflake Integration
Oracle and Snowflake are two distinct data storage options since their structures are very dissimilar. Although there is no direct way to load data from Oracle to Snowflake, using a mediator that connects to both Oracle and Snowflake can ease the process. Steps to move data from Oracle to Snowflake can be categorized as follows:
- Step 1: Extract Data from Oracle to CSV using SQL*Plus
- Step 2: Data Type Conversion and Other Transformations
- Step 3: Staging Files to S3
- Step 4: Finally, Copy Staged Files to the Snowflake Table
Let us go through these steps to connect Oracle to Snowflake in detail.
Step 1: Extract data from Oracle to CSV using SQL*Plus
SQL*Plus is a query tool installed with every Oracle Database Server or Client installation. It can be used to query and redirect the result of an SQL query to a CSV file. The command used for this is: Spool
-- Turn on the spool spool spool_file.txt -- Run your Query select * from dba_table; -- Turn of spooling spool off;
- The spool file will not be visible until the command is turned off
- If the Spool file doesn’t exist already, a new file will be created. If it exists, it will be overwritten by default. There is an append option from Oracle 10g which can be used to append to an existing file.
Most of the time the data extraction logic will be executed in a Shell script. Here is a very basic example script to extract full data from an Oracle table
#!/usr/bin/bash FILE="students.csv" sqlplus -s user_name/password@oracle_db <<EOF SET PAGESIZE 35000 SET COLSEP "|" SET LINESIZE 230 SET FEEDBACK OFF SPOOL $FILE SELECT * FROM EMP; SPOOL OFF EXIT EOF#!/usr/bin/bash FILE="emp.csv" sqlplus -s scott/tiger@XE <<EOF SET PAGESIZE 50000 SET COLSEP "," SET LINESIZE 200 SET FEEDBACK OFF SPOOL $FILE SELECT * FROM STUDENTS; SPOOL OFF EXIT EOF
- SET PAGESIZE – The number of lines per page. The header line will be there on every page.
- SET COLSEP – Setting the column separator.
- SET LINESIZE – The number of characters per line. The default is 80. You can set this to a value in a way that the entire record comes within a single line.
- SET FEEDBACK OFF – In order to prevent logs from appearing in the CSV file, the feedback is put off.
- SPOOL $FILE – The filename where you want to write the results of the query.
- SELECT * FROM STUDENTS – The query to be executed to extract data from the table.
- SPOOL OFF – To stop writing the contents of the SQL session to the file.
Incremental Data Extract
As discussed in the above section, once Spool is on, any SQL can be run and the result will be redirected to the specified file. To extract data incrementally, you need to generate SQL with proper conditions to select only records that are modified after the last data pull.
select * from students where last_modified_time > last_pull_time and last_modified_time <= sys_time.
Now the result set will have only changed records after the last pull.
Step 2: Data type conversion and formatting
While transferring data from Oracle to Snowflake, data might have to be transformed as per business needs. Apart from such use case-specific changes, there are certain important things to be noted for smooth data movement. Also, check out Oracle to MySQL Integration.
- Many errors can be caused by character sets mismatch in source and target. Note that Snowflake supports all major character sets including UTF-8 and UTF-16. The full list can be found here.
- While moving data from Oracle to Big Data systems most of the time data integrity might be compromised due to lack of support for SQL constraints. Fortunately, Snowflake supports all SQL constraints like UNIQUE, PRIMARY KEY, FOREIGN KEY, NOT NULL constraints which is a great help for making sure data has moved as expected.
- Snowflake’s type system covers most primitive and advanced data types which include nested data structures like struct and array. Below is the table with information on Oracle data types and the corresponding Snowflake counterparts.
|Oracle Data Type||Snowflake Data Type|
|FLOAT||CHAR(Up to 16MB)|
|VARCHAR2/NVARCHAR2||VARCHAR(Up to 16MB)|
|TIMESTAMP with TIMEZONE||TIMESTAMP_LTZ|
TIMESTAMP WITH LOCAL TIME ZONE
|BINARY_FLOAT/BINARY_DOUBLE||BINARY(Up to 8M)|
|CLOB / VARCHAR2||VARIANT|
|CLOB / VARCHAR2||ARRAY|
- Often, date and time formats require a lot of attention while creating data pipelines. Snowflake is quite flexible here as well. If a custom format is used for dates or times in the file to be inserted into the table, this can be explicitly specified using “File Format Option”. The complete list of date and time formats can be found here.
Step 3: Stage Files to S3
To load data from Oracle to Snowflake, it has to be uploaded to a cloud staging area first. If you have your Snowflake instance running on AWS, then the data has to be uploaded to an S3 location that Snowflake has access to. This process is called staging. The snowflake stage can be either internal or external.
If you chose to go with this option, each user and table will be automatically assigned to an internal stage which can be used to stage data related to that user or table. Internal stages can be even created explicitly with a name.
- For a user, the default internal stage will be named as ‘@~’.
- For a table, the default internal stage will have the same name as the table.
- There is no option to alter or drop an internal default stage associated with a user or table.
- Unlike named stages file format options cannot be set to default user or table stages.
If an internal stage is created explicitly by the user using SQL statements with a name, many data loading options can be assigned to the stage like file format, date format, etc. When data is loaded to a table through this stage those options are automatically applied.
Note: The rest of this document discusses many Snowflake commands. Snowflake comes with a very intuitive and stable web-based interface to run SQL and commands. However, if you prefer to work with a lightweight command-line utility to interact with the database you might like SnowSQL – a CLI client available in Linux/Mac/Windows to run Snowflake commands. Read more about the tool and options here.
Now let’s have a look at commands to create a stage:
Create a named internal stage my_oracle_stage and assign some default options:
create or replace stage my_oracle_stage copy_options= (on_error='skip_file') file_format= (type = 'CSV' field_delimiter = ',' skip_header = 1);
PUT is the command used to stage files to an internal Snowflake stage. The syntax of the PUT command is :
PUT file://path_to_your_file/your_filename internal_stage_name
Upload a file items_data.csv in the /tmp/oracle_data/data/ directory to an internal stage named oracle_stage
put file:////tmp/oracle_data/data/items_data.csv @oracle_stage;
While uploading the file you can set many configurations to enhance the data load performance like the number of parallelisms, automatic compression, etc. Complete information can be found here.
Let us now look at the external staging option and understand how it differs from the internal stage. Snowflake supports any accessible Amazon S3 or Microsoft Azure as an external staging location. You can create a stage to pointing to the location data that can be loaded directly to the Snowflake table through that stage. No need to move the data to an internal stage.
If you want to create an external stage pointing to an S3 location, IAM credentials with proper access permissions are required. If data needs to be decrypted before loading to Snowflake, proper keys are to be provided. Here is an example to create an external stage:
create or replace stage oracle_ext_stage url='s3://snowflake_oracle/data/load/files/' credentials=(aws_key_id='1d318jnsonmb5#dgd4rrb3c' aws_secret_key='aii998nnrcd4kx5y6z'); encryption=(master_key = 'eSxX0jzskjl22bNaaaDuOaO8=');
Once data is extracted from Oracle it can be uploaded to S3 using the direct upload option or using AWS SDK in your favorite programming language. Python’s boto3 is a popular one used under such circumstances. Once data is in S3, an external stage can be created to point to that location.
Step 4: Copy staged files to Snowflake table
So far – you have extracted data from Oracle, uploaded it to an S3 location, and created an external Snowflake stage pointing to that location. The next step is to copy data to the table. The command used to do this is COPY INTO. Note: To execute the COPY INTO command, compute resources in Snowflake virtual warehouses are required and your Snowflake credits will be utilized.
To load from a named internal stage
copy into oracle_table from @oracle_stage;
Loading from the external stage. Only one file is specified.
copy into my_ext_stage_table from @oracle_ext_stage/tutorials/dataloading/items_ext.csv;
You can even copy directly from an external location without creating a stage:
copy into oracle_table from s3://mybucket/oracle_snow/data/files credentials=(aws_key_id='$AWS_ACCESS_KEY_ID' aws_secret_key='$AWS_SECRET_ACCESS_KEY') encryption=(master_key = 'eSxX009jhh76jkIuLPH5r4BD09wOaO8=') file_format = (format_name = csv_format);
Files can be specified using patterns
copy into oracle_pattern_table from @oracle_stage file_format = (type = 'TSV') pattern='.*/.*/.*[.]csv[.]gz';
Some commonly used options for CSV file loading using the COPY command
- DATE_FORMAT – Specify any custom date format you used in the file so that Snowflake can parse it properly.
- TIME_FORMAT – Specify any custom date format you used in the file.
- COMPRESSION – If your data is compressed, specify algorithms used to compress.
- RECORD_DELIMITER – To mention lines separator character.
- FIELD_DELIMITER – To indicate the character separating fields in the file.
- SKIP_HEADER – This is the number of header lines to skipped while inserting data into the table.
There are many other useful options. For the full list click here.
Update Snowflake Table
We have discussed how to extract data incrementally from the Oracle table. Once data is extracted incrementally, it cannot be inserted into the target table directly. There will be new and updated records that have to be treated accordingly.
Earlier in this document, we mentioned that Snowflake supports SQL constraints. Adding to that, another surprising feature from Snowflake is support for row-level data manipulations which makes it easier to handle delta data load. The basic idea is to load incrementally extracted data into an intermediate or temporary table and modify records in the final table with data in the intermediate table. The three methods mentioned below are generally used for this.
1. Update the rows in the target table with new data (with the same keys). Then insert new rows from the intermediate or landing table which are not in the final table.
UPDATE oracle_target_table t SET t.value = s.value FROM landing_delta_table in WHERE t.id = in.id;
INSERT INTO oracle_target_table (id, value) SELECT id, value FROM landing_delta_table WHERE NOT id IN (SELECT id FROM oracle_target_table);
2. Delete rows from the target table which are also in the landing table. Then insert all rows from the landing table to the final table. Now, the final table will have the latest data without duplicates
DELETE .oracle_target_table f WHERE f.id IN (SELECT id from landing_table); INSERT oracle_target_table (id, value) SELECT id, value FROM landing_table;
3. MERGE Statement – Standard SQL merge statement which combines Inserts and updates. It is used to apply changes in the landing table to the target table with one SQL statement
MERGE into oracle_target_table t1 using landing_delta_table t2 on t1.id = t2.id WHEN matched then update set value = t2.value WHEN not matched then INSERT (id, value) values (t2.id, t2.value);
This method of connecting Oracle to Snowflake works when you have a comfortable project timeline and a pool of experienced engineering resources that can build and maintain the pipeline. However, the method mentioned above comes with a lot of coding and maintenance overhead.
Limitations of Manual ETL Process
- Cost: The cost of hiring an ETL Developer to construct an oracle to Snowflake ETL pipeline might not be favorable in terms of expenses. Method 1 is not a cost-efficient option.
- Maintenance: Maintenance is very important for the data processing system; hence your ETL codes need to be updated regularly due to the fact that development tools upgrade their dependencies and industry standards change. Also, maintenance consumes precious engineering bandwidth which might be utilized elsewhere.
- Scalability: Indeed, scalability is paramount! ETL systems can fail over time if conditions for processing fails. For example, what if incoming data increases 10X, can your processes handle such a sudden increase in load? A question like this requires serious thinking while opting for the manual ETL Code approach.
Method 2: Using Hevo Data to Set up Oracle to Snowflake Integration
Hevo Data, a No-code Data Pipeline, helps you directly transfer data from Oracle to Snowflake and other Data Warehouses, BI tools, or a destination of your choice in a completely hassle-free & automated manner. Hevo is fully managed and completely automates the process of not only loading data from Oracle but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
The steps to load data from Oracle to Snowflake using Hevo Data are as follow:
- Connect your Oracle account to Hevo’s platform. Hevo has an in-built Oracle Integration that connects to your account within minutes.
- Select Snowflake as your destination and start moving your data.
With this, you have successfully set up Oracle to Snowflake Integration using Hevo Data.
Here are more reasons to try Hevo:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Auto Schema Mapping: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo is Built to Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support call
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
With continuous Real-Time data movement, Hevo allows you to combine Oracle data along with your other data sources and seamlessly load it to Snowflake with a no-code, easy-to-setup interface. Try our 14-day full-feature access free trial!Get Started with Hevo for Free
In this article, you have learned about Oracle, Snowflake, and two different approaches to set up Oracle to Snowflake Integration. The manual method involves the use of SQL*Plus and also staging the files to Amazon S3 before copying them into the Snowflake Data Warehouse. This method requires more effort and engineering bandwidth to connect Oracle to Snowflake. Whereas, if you require Real-Time data replication and looking for a fully automated real-time solution, then Hevo is the right choice for you.
Businesses can use automated platforms like Hevo Data to set this integration and handle the ETL process. It helps you directly transfer data from a source of your choice to a Data Warehouse, Business Intelligence tools, or any other desired destination in a fully automated and secure manner without having to write any code and will provide you with a hassle-free experience.
Want to try Hevo?
Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.