Integrating Elasticsearch and Oracle Data: Easy Steps

on Data Integration, Tutorials • November 12th, 2020 • Write for Hevo

Are you looking to integrate your Elasticsearch and Oracle data? We have you covered. In this blog, you will look at the basics of Elasticsearch and Oracle database and how to connect them.

Here’s a snapshot of what you will be looking at:

What is Elasticsearch?

Elasticsearch is a distributed, open-source search and analytics engine built on Apache Lucene and developed in Java. At its core, Elasticsearch is a server that can process JSON requests & returns JSON data. Elasticsearch allows you to store, search and analyze huge volumes of data quickly, in real time and also returns answers in milliseconds. It uses a structure based on documents instead of tables & schemas and comes with extensive REST APIs for storing & searching data. Its backend components include Clusters, Nodes, Shards & Replicas. One of the most famous Tech stacks is the ELK Stack where E stands for Elastic, Logstash, Beats & Kibana. Below are the primary use cases of ElasticSearch:- 

  • Application Search 
  • Website Search 
  • Enterprise Search 
  • Logging & Log Analytics
  • Security Analytics
  • Business Analytics 
  • Infrastructure metrics & container monitoring

What is Oracle Database?

Oracle Autonomous Database is an all-in-one cloud database solution for data marts, data lakes, operational reporting and batch data processing. It is a database commonly used for running Online Transaction Processing (OLTP), Data Warehousing & mixed database workloads. Usually Oracle database products compete with open source relational & non-relational databases like PostgreSQL, Couchbase, MongoDB, Neo4j & many more.  

Pre-requisites:

  1. There must be an Elasticsearch Setup on your machine.  Installing elasticsearch is very straight forward in Windows. Download the latest version of it and extract it in a folder. You can find the link to install Elasticsearch & run on the server. 
  2. In the environment variables section, set the JAVA_HOME path.
  3. Now, if you go to [http://localhost:9200/] in the browser, you can see that elasticsearch is running.
  4. Install Logstash for migrating data from Oracle to Elasticsearch.
  5. To extract the data, make sure your Oracle server is up and running.

Hevo, A Simpler Alternative to Integrate your Data for Analysis

Hevo offers a faster way to move data from databases or SaaS applications into your data warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support call.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.

You can try Hevo for free by signing up for a 14-day free trial.

Role of Logstash in Integrating Elasticsearch and Oracle Data

Logstash is an open-source server-side data processing pipeline that ingests data from many data sources, transforms it & then sends it to the destination. Logstash supports a variety of inputs that pull in events from a multitude of common sources, all at the same time. It allows you to ingest easily from your logs, metrics, webs and various AWS services in a continuous & streaming manner. Here is a detailed guide on how Logstash can be installed & run on Windows machine. 

As Logstash is a primary component for ingesting the data, it is very important to make suitable changes into logstash-ora.conf file as described later in detail. 

JDBC Connection

Java Database Connectivity (JDBC) is an API for Java (a programming language) which defines how a client will be able to access a database. It is a Java-based data access technology used for Java database connectivity. It is a part of the Java Standard Edition platform, from Oracle Corporation. Its basic purpose is to provide methods for query & update data in the database & is oriented towards relational databases. 

JDBC Input Connection

It is a component of the JDBC integration plugin. The plugin is a way to ingest data in any database with a JDBC interface into Logstash. Data ingestions can also be scheduled using Cron Jobs. To be able to use this input plugin, JDBC driver library must be passed explicitly into the plugin using jdbc_driver_library configuration option. The option of scheduling is also present in this plugin. This is based on rufus-scheduler

For detailed information about JDBC Connector configuration, please have a look at the following link

Basic JDBC input parameters setting looks like this:

input {
  jdbc {
    jdbc_driver_library => ""
    jdbc_driver_class => ""
    jdbc_connection_string => ""
    jdbc_user => ""
    parameters => {}
    schedule => ""
    statement => ""
  }
}

Steps for Setting Up Connection & Migrating the Data

  1. Edit the file logstash-ora.conf under config folder in Logstash as below. logstash-ora is an Oracle configuration file. 
input {
 			jdbc {
   				    jdbc_validate_connection => true
  			     	    jdbc_connection_string => "jdbc:oracle:thin:@192.168.1.2:1881/xe"
      				    jdbc_user => "user_details"
      				    jdbc_password => "user_password"
      				    jdbc_driver_library => "C:\elk-2\OJDBC-Full\ojdbc7.jar"
      				    jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
                                                        statement => "select * from search"
  			       }
        }
 
output {
 elasticsearch {
   					hosts => ["http://localhost:9200"]
   					index => "rawsearchdata"
   					#user => "elastic"
  					#password => "changeme"
                }
         }

Input is:

'jdbc_connection_string' = connection string for Oracle
'jdbc_user' = schema_name
'jdbc_driver_library' = path to ojdbc7.jar (OJDBC is freely available for download for all OS)

Output is:

index = Index where the data is going to get stored

  1. Save the file and run the command logstash.bat -f ../config/logstash-ora.conf
  2. Once this is done, data will be loaded in the raw search data index

Pass the column name under the fields.

Common errors you might get in this process:

  1. “Error: java::oracle.jdbc.driver.OracleDriver not loaded. Are you sure you’ve included the correct JDBC driver in :jdbc_driver_library? You may be interested in the ‘–configtest’ flag which you can use to validate logstash’s configuration before you choose to restart a running system.”

    This error could arise because of any Typo error e.g. instead of writing “Java::oracle.jdbc.driver.OracleDriver”, one may type “java::oracle.jdbc.driver.OracleDriver”. Therefore you will have to check carefully about these typo errors. They might be very annoying. 
  1. Java::JavaSql::SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: “host:port:sid”.” 

    This error could rise because of connection string format i.e., “jdbc:oracle:thin:@192.168.1.2:1881/xe”. 
  2. “Java::JavaSql::SQLRecoverableException: IO Error: The Network Adapter could not establish the connection”.

    When this error occurs, check the following cases if they are running fine:
    • The database and the listener are running.
    • The specified port in the code is the same port that the database is listening to.
    • There is no firewall blocking.

Conclusion

The Process of connecting ElasticSearch with the Oracle database can be intimidating & usually results in errors which need to be resolved in order to get the connection working. One simple way could be by using Hevo Data that automates the whole connection process by building automated data pipelines. 

Hevo is a No-code data pipeline. It has pre-built integrations with 100+ sources. You can connect your SaaS platforms, databases, etc. to any data warehouse of your choice, without writing any code or worrying about maintenance. If you are interested, you can try Hevo by signing up for the 14-day free trial.

Share your thoughts on Elasticsearch and Oracle data in the comments!

No-Code Data Pipeline for all your Data