Elasticsearch is a distributed, open-source search and analytics engine built for handling large volumes of data and enabling real-time search capabilities. It provides scalable and efficient solutions for indexing, searching, and analyzing diverse types of structured and unstructured data.

MongoDB is a NoSQL database that offers a flexible, document-oriented data model, allowing for the storage and retrieval of data in a JSON-like BSON format.

Integrating Elasticsearch and MongoDB allows for efficient handling of diverse data types, combining MongoDB’s document-oriented storage with Elasticsearch’s powerful search capabilities.

In this blog, you will learn how you can connect ElasticSearch and MongoDB for indexing and searching for massive datasets. Here’s the detailed list that you’ll be covering in this blog.

Download the Guide on How to Set Up a Data Analytics Stack
Download the Guide on How to Set Up a Data Analytics Stack
Download the Guide on How to Set Up a Data Analytics Stack
Learn how to build a self-service data analytics stack for your use case.

Integrate ElasticSearch and MongoDB

MongoDB is used for storage, and ElasticSearch is used to perform full-text indexing over the data. Hence, the combination of MongoDB for storing and ElasticSearch for indexing is a common architecture that many organizations follow.

You can use various tools to replicate the data between Elasticsearch and MongoDB for indexing. Let’s look at some of the top plugins or tools to copy or synchronize data from MongoDB to ElasticSearch.

MongoDB River Plugin

ElasticSearch-River-MongoDB is a plugin used to synchronize the data between ElasticSearch and MongoDB. 

In MongoDB, whenever the document is inserted into the database, the schema is updated and all the operations like Insert, Update, Delete are stored in Operation Log (oplog) collection as a rolling record. River plugin monitors the oplog collection and syncs them with ElasticSearch based on the configuration automatically. Once the data is synced with ElasticSearch, the indexes are updated automatically within ElasticSearch.

For the source code of River Plugin, you can refer to the GitHub link here – elasticsearch-river-mongodb

Steps to use Mongo River Connector

This plugin requires MongoDB as the source and ElasticSearch as the target to migrate and sync data between these two sources.

  1. To install the plugin, execute the below command at the MongoDB installation location – 
bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.9
  1. Check the compatibility of the connector with the ElasticSearch version here.
  2. Create the indexing river with the below curl syntax – 
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
    "type": "mongodb", 
    "mongodb": { 
      "db": "DATABASE_NAME", 
      "collection": "COLLECTION", 
      "gridfs": true
    }, 
    "index": { 
      "name": "ES_INDEX_NAME", 
      "type": "ES_TYPE_NAME" 
    }
  }'

Example – 

 curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{ 
    "type": "mongodb", 
    "mongodb": { 
      "db": "testmongo", 
      "collection": "person"
    }, 
    "index": {
      "name": "mongoindex", 
      "type": "person" 
    }
  }'
  1. To view indexed data in ElasticSearch, use the following command – 
http://localhost:9200/mongoindex/person/_search?preety

To know more about MongoDB-River Plugin usage, you can check the Github report here.

LogStash

LogStash is an open-source tool from the ELK stack and is used to unify the data from multiple sources and then normalizes the data on the destinations. LogStash inputs the data from the source, modifies them using filters, and then outputs it to the destination.

As LogStash is the tool from the ELK stack, it has excellent capabilities to connect with ElasticSearch, you can use LogStash to take input from MongoDB by using JDBC connector, and output to ElasticSearch. With the help of filters, you can modify the data in transit to ES(if required).

Image source

To get started with LogStash and to read more about LogStash, you can look here.

Steps to use LogStash

  1.  To connect ElasticSearch and MongoDB via LogStash, you need the “logstash-input-mongodb” input plugin.
  1. Navigate to the LogStash Installation directory and perform the following commands – 
cd /usr/share/logstash
bin/logstash-plugin install logstash-input-mongodb
  1. Once the installation is successful, you need to create a configuration file to take MongoDB as input and ElasticSearch as an output. A sample configuration file will look like as shown below – 
input {
        uri => 'mongodb://username:password@xxxx-00-00-nxxxn.mongodb.net:27017/xxxx?ssl=true'
        placeholder_db_dir => '/opt/logstash-mongodb/'
        placeholder_db_name => 'logstash_sqlite.db'
        collection => 'users'
        batch_size => 5000
}
filter {

}
output {
        stdout {
                codec => rubydebug
        }
        elasticsearch {
                action => "index"
                index => "mongo_log_data"
                hosts => ["localhost:9200"]
        }
}
  1. Once the configuration file is successfully set up, you can execute the below command to start the pipeline.
bin/logstash -f /etc/logstash/conf.d/mongodata.conf
  1. The above command will start fetching data from the MongoDB collection and will push to ElasticSearch for indexing. In ElasticSearch, an index named “mongo_log_data” will be created.

Mongo Connector

Mongo-Connector is the proprietary tool by MongoDB and a real-time sync system built on Python that allows you to copy the documents from MongoDB to target systems. 

MongoDB connector creates a pipeline from one MongoDB cluster to target systems like ElasticSearch, Solr. On startup, it connects MongoDB to target systems and copies the data. Afterward, it regularly checks for the update and performs continuous updates on the target system to keep everything in sync.

To sync the data to ElasticSearch, MongoDB needs to run in replica-set mode. Once the initial sync is completed, it then tails the Mongo oplog(Operation Log) to keep everything in sync in real-time.

To know more about Mongo Connector, you can look at the official page here – mongo-connector

Steps to use Mongo Connector

  1. Download the ElasticSearch Doc Manager. DocManager is a lightweight, simple-to-write class that defines a limited number of CRUD operations for the target system. To download the Doc Manager for ElasticSearch, follow the guide here – 

Elastic 1.x doc manager: https://github.com/mongodb-labs/elastic-doc-manager

Elastic 2.x doc manager: https://github.com/mongodb-labs/elastic2-doc-manager

  1. Install the Mongo Connector based on the type of ElasticSearch you’re using. The following metrics will help you decide the correct installation based on the versions – 
ElasticSearch VersionInstallation Command
Elasticsearch 1.xpip install ‘mongo-connector[elastic]’
Amazon Elasticsearch 1.x Servicepip install ‘mongo-connector[elastic-aws]’
Elasticsearch 2.xpip install ‘mongo-connector[elastic2]’
Amazon Elasticsearch 2.x Servicepip install ‘mongo-connector[elastic2-aws]’
Elasticsearch 5.xpip install ‘mongo-connector[elastic5]’
  1. MongoDB connector uses oplog from MongoDB to replicate the operations, so a replica set must be running before startup. To create one node replica set, execute the below command – 
mongod --replSet myDevReplSet
rs.initiate()
  1. Once the replica set is up and running, you can invoke the connector as – 
mongo-connector -m <mongodb server hostname>:<replica set port> -t <replication endpoint URL, e.g. http://localhost:8983/es> -d <name of doc manager, e.g., elasticsearch_doc_manager>

To know more about MongoDB-Connector ElasticSearch usage, follow the Github guide here.

Hevo, A Simpler Alternative to Integrate your Data for Analysis

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Get Started with Hevo for Free

Conclusion

In this blog post, we have discussed how easily you can connect ElasticSearch and MongoDB for continuous indexing and searching of the documents. However, if you’re looking for a more straightforward solution, you can use Hevo Data – a No Code Data pipeline that you can use to build an ETL pipeline in an instant.

Visit our Website to Explore Hevo

Hevo integrates with 150+ Data sources, including SaaS applications, databases, BI tools, etc.

Want to take Hevo for a spin?

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also look at the unbeatable Hevo Pricing that will help you choose the right plan for your business needs.

Vishal Agrawal
Freelance Technical Content Writer, Hevo Data

Vishal has a passion towards the data realm and applies analytical thinking and a problem-solving approach to untangle the intricacies of data integration and analysis. He delivers in-depth researched content ideal for solving problems pertaining to modern data stack.

No-code Data Pipeline for your Data Warehouse