Oracle to BigQuery: 2 Easy Methods

on Tutorial • September 2nd, 2021 • Write for Hevo

In a time where data is being termed the new oil, it is extremely important for businesses to have a data management system that suits their needs perfectly and positions them to be able to take full advantage of the benefits of being data-driven. Data is being generated at rapid rates and businesses needs database systems that are able to scale up and scale down effortlessly without any extra computational cost.

Enterprises are exhausting a huge chunk of their data budgets in just maintaining their present physical database systems instead of directing the said budget towards gaining tangible insights from their data. This scenario is far from ideal and is the reason why moving your Oracle data to a cloud-based Data Warehouse like Google BigQuery is no longer a want but a need. This post provides a step-by-step walkthrough on how to migrate data from Oracle to BigQuery.

Table of Contents

Introduction to Oracle

Oracle database is a relational database system that helps businesses store and retrieve data. Oracle DB(as it’s fondly called) provides a perfect combination of high-level technology and integrated business solutions which is a non-negotiable requisite for businesses that store and access huge amounts of data. This makes it one of the world’s trusted database management systems.

Introduction to Google BigQuery

Google BigQuery is a cloud-based serverless Data Warehouse for processing a large amount of data at a rapid rate. It is called serverless as it automatically scales when running, depending on the data volume and query complexity. Hence, there is no need to spend a huge part of your database budget on in-site infrastructure and database administrators. BigQuery is a standout performer when it comes to analysis and data warehousing. It provides its customers with the freedom and flexibility to create a plan of action that epitomizes their entire business structure.

Performing ETL from Oracle to BigQuery

There are majorly two ways of migrating data from Oracle to BigQuery. The two ways are:

Method 1: Using Custom ETL Scripts to Connect Oracle to BigQuery

This method involves a 5-step process of utilizing Custom ETL Scripts to establish a connection from Oracle to BigQuery in a seamless fashion. There are considerable upsides to this method, and a few limitations as well.

Method 2: Using Hevo to Connect Oracle to BigQuery

Hevo provides a hassle-free solution and helps you set up a connection from Oracle to BigQuery without any intervention in an effortless manner. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. 

Get Started with Hevo for Free

Hevo’s pre-built integration with Oracle (among 100+ Sources) will take full charge of the data transfer process, allowing you to focus on key business activities. 

In this post, we will cover the second method (Custom Code) in detail. Towards the end of the post, you can also find a quick comparison of both data migration methods so that you can evaluate your requirements and choose wisely.

Methods to Connect Oracle to BigQuery

Here are the methods you can use to connect Oracle to BigQuery in a seamless fashion:

Method 1: Using Custom ETL Scripts to Connect Oracle to BigQuery

The steps involved in migrating data from Oracle DB to BigQuery are as follows:

Let’s take a step-by-step look at each of the steps mentioned above.

Step 1: Export Data from Oracle DB to CSV Format

  • BigQuery does not support the binary format produced by Oracle DB. Hence we will have to export our data to a CSV(comma-separated value) file.
  • Oracle SQL Developer is the preferred tool to carry out this task. It is a free, integrated development environment. This tool makes it exceptionally simple to develop and manage Oracle databases both on-premise and on the cloud. It is a migration tool for moving your database to and from Oracle. Oracle SQL Developer can be downloaded for free from here.
  • Open the Oracle SQL Developer tool, right-click the table name in the object tree view
  • Click on export
  • Select CSV, and the export data window will pop up
  • Select the format tab and select format as CSV
  • Enter the preferred file name and location
  • Select the columns tab and verify the columns you wish to export
  • Select the where tab and add any criteria you wish to filter the data with
  • Click on apply.

Step 2: Extract Data from Oracle DB

  • The COPY_FILE procedure in the DBMS_FILE_TRANSFER package is used to copy a file to a local file system. The following example copies a CSV file named client.csv  from the /usr/admin/source directory to the /usr/admin/destination directory as client_copy.csv on a local file system.
  • The SQL command CREATE DIRECTORY is used to create a directory object for the object you want to create the CSV file. For instance, you want to create a directory object called source for the /usr/admin/source directory on your computer system, execute the following code block
CREATE DIRECTORY source AS '/usr/admin/source';
  • Use the SQL command CREATE DIRECTORY to create a directory object for the directory into which you want to copy the CSV file. An illustration is given below
CREATE DIRECTORY dest_dir AS '/usr/admin/destination';

         Where dest_dir is the destination directory

  • Grant required access to the user who is going to run the COPY_FILE procedure. An illustration is given below:
GRANT EXECUTE ON DBMS_FILE_TRANSFER TO admin;

GRANT READ ON DIRECTORY source TO admin;

GRANT WRITE ON DIRECTORY client TO admin;
  • Connect as admin user and provide the required password when required:
CONNECT admin
  • Execute  the COPY_FILE procedure to copy the file:
BEGIN
 DBMS_FILE_TRANSFER.COPY_FILE(
       source_directory_object       => 'source',
       source_file_name              => 'client.csv',
       destination_directory_object  => 'dest_dir',
       destination_file_name         => 'client_copy.csv');
END;

Step 3: Upload to Google Cloud Storage

Once the data has been extracted from Oracle DB the next step is to upload it GCS There are multiple ways this can be achieved. The various methods are explained below.

  • Using Gsutil

GCP has built Gsutil to assist in handling objects and buckets in GCS. It provides an easy and unique way to load a file from your local machine to GCS.

To copy a file to GCS:

gsutil cp client_copy.csv  gs://my-bucket/path/to/folder/

To copy an entire folder to GCS:

gsutil dest_dir -r dir gs://my-bucket/path/to/parent/
  • Using Web console

An alternative means to upload the data from your local machine to GCS is using the web console. To use the web console alternative follow the steps laid out below.

  • Login to the GCP using the link. You ought to have a working Google account to make use of GCP. Click on the hamburger menu which produces a drop-down menu. Hit on storage and navigate to the browser on the left tab.


Upload Oracle Data To GCS

  • Create a new bucket where you will migrate your data to. Make sure the name you choose is globally unique.



  • Click on the bucket you created and select upload files. This action takes you to your local directory where you choose the file you want to upload.
  • The data upload process starts immediately and a progress bar is shown. Wait for completion, after completion the file will be seen in the bucket.

Step 4: Upload to BigQuery from GCS

To upload to BigQuery you make use of either the web console UI or the command line. Let us look at a brief on both methods.

First, let’s let look into uploading the data using the web console UI.

  • The first step is to go to the BigQuery console under the hamburger menu.
Uploading Data to BigQuery from GCS - Hevo
  • Create a dataset and fill the drop-down form.
  • Click and select the data set created by you. An icon showing ‘create table’ will appear below the query editor. Select it.
  • Fill in the drop-down list and create the table. To finish uploading the table, the schema has to be specified. This will be done using the command-line tool. When using the command line interacting with GCS is a lot easier and straightforward.

Cloud Shell Icon on GCS

To access the command line, when on the GCS home page click on the Activate cloud shell icon shown below.

The syntax of the bq command-line shown below:

bq --location=[LOCATION] load --source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE] [SCHEMA]

[LOCATION] is an optional parameter that represents your Location.
[FORMAT] is to be set to CSV.
[DATASET] represents an existing dataset.
[TABLE] is the table name into which you're loading data.
[PATH_TO_SOURCE] is a fully-qualified Cloud Storage URI.
[SCHEMA] is a valid schema. The schema must be a local JSON file or inline. 
Note: Instead of using supplying a schema definition, there is an autodetect flag that can be used.

You can specify your scheme using the bq command line. An illustration is shown below using a JSON file

bq --location=US load --source_format=CSV your_dataset.your_table gs://your_bucket/your_data.csv ./your_schema.json

The schema can also be autodetected. An example is shown below

bq --location=US load --autodetect --source_format=CSV your_dataset.your_table  gs://mybucket/data.csv
  • BigQuery command-line interface offers us 3 options to write to an existing table. This method will be used to copy data to the table we created above.

The options are:

a) Overwrite the table

bq --location=US load --autodetect --replace --source_format=CSV your_dataset_name.your_table_name gs://bucket_name/path/to/file/file_name.csv

b) Append the table

bq --location=US load --autodetect --noreplace --source_format=CSV your_dataset_name.your_table_name gs://bucket_name/path/to/file/file_name.csv ./schema_file.json

c) Adding a new field to the target table. In this code, the schema will be given an extra field.

bq --location=asia-northeast1 load --noreplace --schema_update_option=ALLOW_FIELD_ADDITION --source_format=CSV your_dataset.your_table gs://mybucket/your_data.csv ./your_schema.json

Step 5: Update the Target Table in BigQuery

The data that was joined in the steps above have not been fully updated to the target table. The data is stored in an intermediate data table, this is because GCS is a staging area for BigQuery upload. Hence, the data is stored in an intermediate table before been uploaded to BigQuery

There are two ways of updating the final table as explained below.

  • Update the rows in the final table and insert new rows from the intermediate table.
UPDATE final_table  t  SET t.value = s.value  FROM  intermediate_data_table  s  WHERE t.id = s.id; INSERT final_table (id, value)  SELECT id, value  FROM  intermediate_data_table  WHERE NOT id IN (SELECT id FROM final_table);
  • Delete all the rows from the final table which are in the intermediate table.
DELETE final_table f WHERE f.id IN (SELECT id from intermediate_data_table); INSERT data_setname.final_table(id, value) SELECT id, value FROM  data_set_name.intermediate_data_table;

Limitations of Using Custom ETL Scripts to Connect Oracle to BigQuery

  • Writing custom code would add value only if you are looking to move data once from Oracle to BigQuery.
  • When a use case that needs data to be synced on an ongoing basis or in real-time from Oracle into BigQuery arises, you would have to move it in an incremental format. This process is called Change Data Capture. The custom code method mentioned above fails here. You would have to write additional lines of code to achieve this.
  • When you build custom SQL scripts to extract a subset of the data set in Oracle DB, there is a chance that the script breaks as the source schema keeps changing or evolving.
  • Often, there arises a need to transform the data (Eg: hide Personally Identifiable Information) before loading it into BigQuery. Achieving this would need you to add additional time and resources to the process.

In a nutshell, ETL scripts are fragile with a high propensity to break. This makes the entire process error-prone and becomes a huge hindrance in the path of making accurate, reliable data available in BigQuery.

Method 2: Using Hevo to Connect Oracle to BigQuery

Hevo Logo
Image Source

Using a fully managed Data Pipeline platform like Hevo can help you get your data from Oracle to BigQuery in minutes. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.

Sign up here for a 14-Day Free Trial!

Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss. Here are the steps to replicate Oracle to BigQuery using Hevo:

  • Step 1: Connect to your Oracle database by providing the Pipeline Name, Database Host, Database Port, Database User, Database Password, and Service Name.
Oracle Source Config
Image Source
  • Step 2: Configure Oracle to BigQuery Warehouse migration by providing the Destination Name, Project ID, GCS Bucket, Dataset ID, Enabling Stream Inserts, and Sanitize Table/Column Names.
Destination Config
Image Source

Check out what makes Hevo amazing:

  • Real-Time Data Transfer: Hevo with its strong Integration with 100+ sources, allows you to transfer data quickly & efficiently. This ensures efficient utilization of bandwidth on both ends.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Tremendous Connector Availability: Hevo houses a large variety of connectors and lets you bring in data from numerous Marketing & SaaS applications, databases, etc. such as Google Analytics 4, Google Firebase, Airflow, HubSpot, Marketo, MongoDB, Oracle, Salesforce, Redshift, etc. in an integrated and analysis-ready form.
  • Simplicity: Using Hevo is easy and intuitive, ensuring that your data is exported in just a few clicks. 
  • Completely Managed Platform: Hevo is fully managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.

Conclusion

This blog talks about the two methods you can use to connect Oracle to BigQuery in a seamless fashion. It also gives a brief overview of Oracle and BigQuery before diving into the two methods.

Visit our Website to Explore Hevo

With Hevo, you can achieve simple and efficient Data Migration from Oracle to BigQuery. Hevo can help you move data from not just Oracle but 100s of additional data sources. This makes Hevo the right partner to be by your side as your business scales.

Sign Up for a 14-Day Free Trial with Hevo and experience a seamless, hassle-free data migration experience from Oracle to BigQuery. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

No-code Data Pipeline for BigQuery