In a time where data is being termed the new oil, businesses need to have a data management system that suits their needs perfectly and positions them to be able to take full advantage of the benefits of being data-driven.
Data is being generated at rapid rates and businesses need database systems that can scale up and scale down effortlessly without any extra computational cost.
Enterprises are exhausting a huge chunk of their data budgets in just maintaining their present physical database systems instead of directing the said budget towards gaining tangible insights from their data.
This scenario is far from ideal and is the reason why moving your Oracle data to a cloud-based Data Warehouse like Google BigQuery is no longer a want but a need.
This post provides a step-by-step walkthrough on how to migrate data from Oracle to BigQuery.
Introduction to Oracle
Oracle database is a relational database system that helps businesses store and retrieve data.
Oracle DB(as it’s fondly called) provides a perfect combination of high-level technology and integrated business solutions which is a non-negotiable requisite for businesses that store and access huge amounts of data. This makes it one of the world’s trusted database management systems.
Introduction to Google BigQuery
Google BigQuery is a cloud-based serverless Data Warehouse for processing a large amount of data at a rapid rate. It is called serverless as it automatically scales when running, depending on the data volume and query complexity.
Hence, there is no need to spend a huge part of your database budget on in-site infrastructure and database administrators.
BigQuery is a standout performer when it comes to analysis and data warehousing.
It provides its customers with the freedom and flexibility to create a plan of action that epitomizes their entire business structure.
There are majorly two ways of migrating data from Oracle to BigQuery. The two ways are:
Method 1: Using Custom ETL Scripts to Connect Oracle to BigQuery
This method involves a 5-step process of utilizing Custom ETL Scripts to establish a connection from Oracle to BigQuery in a seamless fashion. There are considerable upsides to this method and a few limitations as well.
Method 2: Using Hevo to Connect Oracle to BigQuery
Hevo streamlines the process of connecting Oracle to BigQuery, enabling seamless data transfer and transformation between the two platforms. This ensures efficient data migration, accurate analytics, and comprehensive insights by leveraging BigQuery’s advanced analytics capabilities.
Get Started with Hevo for Free
In this post, we will cover the second method (Custom Code) in detail. Toward the end of the post, you can also find a quick comparison of both data migration methods so that you can evaluate your requirements and choose wisely.
Methods to Connect Oracle to BigQuery
Here are the methods you can use to set up Oracle to BigQuery migration in a seamless fashion:
Method 1: Using Custom ETL Scripts to Connect Oracle to BigQuery
Let’s take a step-by-step look at each of the steps mentioned above.
Step 1: Export Data from Oracle DB to CSV Format
- BigQuery does not support the binary format produced by Oracle DB. Hence we will have to export our data to a CSV(comma-separated value) file.
- Oracle SQL Developer is the preferred tool to carry out this task. It is a free, integrated development environment. This tool makes it exceptionally simple to develop and manage Oracle databases both on-premise and on the cloud. It is a migration tool for moving your database to and from Oracle. Oracle SQL Developer can be downloaded for free from here.
- Open the Oracle SQL Developer tool, and right-click the table name in the object tree view.
- Click on Export.
- Select CSV, and the export data window will pop up.
- Select the format tab and select the format as CSV.
- Enter the preferred file name and location.
- Select the columns tab and verify the columns you wish to export.
- Select the Where tab and add any criteria you wish to use to filter the data.
- Click on apply.
- The COPY_FILE procedure in the DBMS_FILE_TRANSFER package is used to copy a file to a local file system. The following example copies a CSV file named client.csv from the /usr/admin/source directory to the /usr/admin/destination directory as client_copy.csv on a local file system.
- The SQL command CREATE DIRECTORY is used to create a directory object for the object you want to create the CSV file. For instance, if you want to create a directory object called source for the /usr/admin/source directory on your computer system, execute the following code block
CREATE DIRECTORY source AS '/usr/admin/source';
- Use the SQL command CREATE DIRECTORY to create a directory object for the directory into which you want to copy the CSV file. An illustration is given below
CREATE DIRECTORY dest_dir AS '/usr/admin/destination';
Where dest_dir is the destination directory
- Grant required access to the user who is going to run the COPY_FILE procedure. An illustration is given below:
GRANT EXECUTE ON DBMS_FILE_TRANSFER TO admin;
GRANT READ ON DIRECTORY source TO admin;
GRANT WRITE ON DIRECTORY client TO admin;
- Connect as an admin user and provide the required password when required:
CONNECT admin
- Execute the COPY_FILE procedure to copy the file:
BEGIN
DBMS_FILE_TRANSFER.COPY_FILE(
source_directory_object => 'source',
source_file_name => 'client.csv',
destination_directory_object => 'dest_dir',
destination_file_name => 'client_copy.csv');
END;
Step 3: Upload to Google Cloud Storage
Once the data has been extracted from Oracle DB the next step is to upload it to GCS. There are multiple ways this can be achieved. The various methods are explained below.
GCP has built Gsutil to assist in handling objects and buckets in GCS. It provides an easy and unique way to load a file from your local machine to GCS.
To copy a file to GCS:
gsutil cp client_copy.csv gs://my-bucket/path/to/folder/
To copy an entire folder to GCS:
gsutil dest_dir -r dir gs://my-bucket/path/to/parent/
An alternative means to upload the data from your local machine to GCS is using the web console. To use the web console alternative follow the steps laid out below.
- Login to the GCP using the link. You ought to have a working Google account to make use of GCP. Click on the hamburger menu which produces a drop-down menu. Hit on storage and navigate to the browser on the left tab.
- Create a new bucket to which you will migrate your data. Make sure the name you choose is globally unique.
- Click on the bucket you created and select Upload files. This action takes you to your local directory where you choose the file you want to upload.
- The data upload process starts immediately and a progress bar is shown. Wait for completion, after completion the file will be seen in the bucket.
Migrate data from Oracle to BigQuery
Migrate data from Oracle to Snowflake
Migrate data from Amazon S3 to BigQuery
Step 4: Upload to BigQuery from GCS
To upload to BigQuery you make use of either the web console UI or the command line. Let us look at a brief on both methods.
First, let’s let look into uploading the data using the web console UI.
- The first step is to go to the BigQuery console under the hamburger menu.
- Create a dataset and fill out the drop-down form.
- Click and select the data set created by you. An icon showing ‘create table’ will appear below the query editor. Select it.
- Fill in the drop-down list and create the table. To finish uploading the table, the schema has to be specified. This will be done using the command-line tool. When using the command line interacting with GCS is a lot easier and straightforward.
To access the command line, when on the GCS home page click on the Activate cloud shell icon shown below.
The syntax of the bq command line is shown below:
bq --location=[LOCATION] load --source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE] [SCHEMA]
[LOCATION] is an optional parameter that represents your Location.
[FORMAT] is to be set to CSV.
[DATASET] represents an existing dataset.
[TABLE] is the table name into which you're loading data.
[PATH_TO_SOURCE] is a fully-qualified Cloud Storage URI.
[SCHEMA] is a valid schema. The schema must be a local JSON file or inline.
Note: Instead of using supplying a schema definition, there is an autodetect flag that can be used.
You can specify your scheme using the bq command line. An illustration is shown below using a JSON file
bq --location=US load --source_format=CSV your_dataset.your_table gs://your_bucket/your_data.csv ./your_schema.json
The schema can also be auto-detected. An example is shown below:
bq --location=US load --autodetect --source_format=CSV your_dataset.your_table gs://mybucket/data.csv
- BigQuery command-line interface offers us 3 options to write to an existing table. This method will be used to copy data to the table we created above.
The options are:
a) Overwrite the table
bq --location=US load --autodetect --replace --source_format=CSV your_dataset_name.your_table_name gs://bucket_name/path/to/file/file_name.csv
b) Append the table
bq --location=US load --autodetect --noreplace --source_format=CSV your_dataset_name.your_table_name gs://bucket_name/path/to/file/file_name.csv ./schema_file.json
c) Add a new field to the target table. In this code, the schema will be given an extra field.
bq --location=asia-northeast1 load --noreplace --schema_update_option=ALLOW_FIELD_ADDITION --source_format=CSV your_dataset.your_table gs://mybucket/your_data.csv ./your_schema.json
Step 5: Update the Target Table in BigQuery
The data that was joined in the steps above have not been fully updated to the target table.
The data is stored in an intermediate data table, this is because GCS is a staging area for BigQuery upload. Hence, the data is stored in an intermediate table before being uploaded to BigQuery:
There are two ways of updating the final table as explained below.
- Update the rows in the final table and insert new rows from the intermediate table.
UPDATE final_table t SET t.value = s.value FROM intermediate_data_table s WHERE t.id = s.id; INSERT final_table (id, value) SELECT id, value FROM intermediate_data_table WHERE NOT id IN (SELECT id FROM final_table);
- Delete all the rows from the final table which are in the intermediate table.
DELETE final_table f WHERE f.id IN (SELECT id from intermediate_data_table); INSERT data_setname.final_table(id, value) SELECT id, value FROM data_set_name.intermediate_data_table;
Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Learn the best practices and considerations for setting up high-performance ETL to BigQuery
Limitations of Using Custom ETL Scripts to Connect Oracle to BigQuery
- Writing custom code would add value only if you are looking to move data once from Oracle to BigQuery.
- When a use case that needs data to be synced on an ongoing basis or in real-time from Oracle into BigQuery arises, you would have to move it in an incremental format. This process is called Change Data Capture. The custom code method mentioned above fails here. You would have to write additional lines of code to achieve this.
- When you build custom SQL scripts to extract a subset of the data set in Oracle DB, there is a chance that the script breaks as the source schema keeps changing or evolving.
- Often, there arises a need to transform the data (Eg: hide Personally Identifiable Information) before loading it into BigQuery. Achieving this would need you to add additional time and resources to the process.
In a nutshell, ETL scripts are fragile with a high propensity to break. This makes the entire process error-prone and becomes a huge hindrance in the path of making accurate, reliable data available in BigQuery.
Method 2: Using Hevo to Connect Oracle to BigQuery
Integrate your Data Seamlessly
No credit card required
Using a fully managed No-Code Data Pipeline platform like Hevo can help you replicate data from Oracle to BigQuery in minutes. Hevo completely automates the process of not only loading data from Oracle but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code.
Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss. Here are the steps to replicate data from Oracle to BigQuery using Hevo:
- Step 1: Connect to your Oracle database by providing the Pipeline Name, Database Host, Database Port, Database User, Database Password, and Service Name.
- Step 2: Configure Oracle to BigQuery Warehouse migration by providing the Destination Name, Project ID, GCS Bucket, Dataset ID, Enabling Stream Inserts, and Sanitize Table/Column Names.
Here are more reasons to love Hevo:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Auto Schema Mapping: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo is Built to Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support call
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Conclusion
This blog talks about the two methods you can use to connect Oracle to BigQuery in a seamless fashion. If you rarely need to transfer your data from Oracle to BigQuery, then the first manual Method will work fine. Whereas, if you require Real-Time Data Replication and looking for an Automated Data Pipeline Solution, then Hevo is the right choice for you!
FAQ on Oracle to BigQuery
How to Convert Oracle Query to BigQuery?
Map Oracle data types and functions to their BigQuery equivalents (e.g., VARCHAR2
to STRING
, NVL()
to IFNULL()
). Adjust syntax for differences in SQL functions and features.
How to connect Oracle to BigQuery?
Use a data migration tool like Google Cloud Dataflow, Informatica, or Fivetran, or set up an ETL pipeline to extract data from Oracle and load it into BigQuery.
What is the difference between Oracle and BigQuery?
Oracle is an on-premise, traditional relational database, while BigQuery is a fully managed, serverless data warehouse optimized for large-scale analytics on Google Cloud.
Want to take Hevo for a spin? Sign up for a 14-day free trial and experience the feature-rich Hevo suite firsthand.
Bukunmi is curious about learning on complex concepts and latest trends in data science and combines his flair for writing to curate content for data teams to help them solve business challenges.