Integrating Elasticsearch and MongoDB doesn’t have to be difficult. In this guide, discover three easy methods to integrate Elasticsearch with MongoDB, allowing you to search and analyze your data more effectively.
Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (60+ free sources), we help you:
- Export data from sources & Load data to the destinations
- Transform & Enrich your data, & make it analysis-ready
- Experience Completely Automated Pipelines with Real-Time Data Transfer
Take Hevo’s 14-day free trial to experience a better way to manage your data pipelines. Discover why top companies like Postman choose Hevo to build their data pipelines.
Get Started with Hevo for Free
Overview of Elasticsearch
Elasticsearch is a distributed, open-source search and analytics engine built for handling large volumes of data and enabling real-time search capabilities. It provides scalable and efficient solutions for indexing, searching, and analyzing diverse types of structured and unstructured data.
Integrating Elasticsearch and MongoDB allows for efficient handling of diverse data types, combining MongoDB’s document-oriented storage with Elasticsearch’s powerful search capabilities.
Overview of MongoDB
MongoDB is a NoSQL database that offers a flexible, document-oriented data model, allowing for the storage and retrieval of data in a JSON-like BSON format.
MongoDB is used for storage, and Elasticsearch is used to perform full-text indexing over the data. Hence, the combination of MongoDB for storing and Elasticsearch for indexing is a common architecture that many organizations follow.
Work with Elasticsearch Seamlessly!
No credit card required
Methods to Connect Elasticsearch and MongoDB in Real-Time
You can use various tools to replicate the data between Elasticsearch and MongoDB for indexing. Let’s look at some of the top plugins and tools to copy or synchronize data from MongoDB to Elasticsearch.
Method 1: Using MongoDB River Plugin
- Elasticsearch-River-MongoDB is a plugin used to synchronize the data between Elasticsearch and MongoDB.
- In MongoDB, whenever the document is inserted into the database, the schema is updated and all the operations like Insert, Update, Delete are stored in Operation Log (oplog) collection as a rolling record.
- River plugin monitors the oplog collection and syncs them with Elasticsearch based on the configuration automatically. Once the data is synced with Elasticsearch, the indexes are updated automatically within Elasticsearch.
Steps to use Mongo River Connector
This plugin requires MongoDB as the source and ElasticSearch as the target to migrate and sync data between these two sources.
- To install the plugin, execute the below command at the MongoDB installation location –
bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.9
- Check the compatibility of the connector with the Elasticsearch version.
- Create the indexing river with the below curl syntax –
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "DATABASE_NAME",
"collection": "COLLECTION",
"gridfs": true
},
"index": {
"name": "ES_INDEX_NAME",
"type": "ES_TYPE_NAME"
}
}'
Example –
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "testmongo",
"collection": "person"
},
"index": {
"name": "mongoindex",
"type": "person"
}
}'
- To view indexed data in ElasticSearch, use the following command –
http://localhost:9200/mongoindex/person/_search?preety
Method 2: Using LogStash
- LogStash is an open-source tool from the ELK stack and is used to unify the data from multiple sources and then normalizes the data on the destinations.
- LogStash inputs the data from the source, modifies them using filters, and then outputs it to the destination.
- As LogStash is the tool from the ELK stack, it has excellent capabilities to connect with Elasticsearch, you can use LogStash to take input from MongoDB by using JDBC connector, and output to Elasticsearch.
- With the help of filters, you can modify the data in transit to ES(if required).
Steps to use LogStash
- To connect ElasticSearch and MongoDB via LogStash, you need the
logstash-input-mongodb
input plugin.
- Navigate to the LogStash Installation directory and perform the following commands –
cd /usr/share/logstash
bin/logstash-plugin install logstash-input-mongodb
- Once the installation is successful, you need to create a configuration file to take MongoDB as input and ElasticSearch as an output. A sample configuration file will look like as shown below –
input {
uri => 'mongodb://username:password@xxxx-00-00-nxxxn.mongodb.net:27017/xxxx?ssl=true'
placeholder_db_dir => '/opt/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'users'
batch_size => 5000
}
filter {
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
action => "index"
index => "mongo_log_data"
hosts => ["localhost:9200"]
}
}
- Once the configuration file is successfully set up, you can execute the below command to start the pipeline.
bin/logstash -f /etc/logstash/conf.d/mongodata.conf
- The above command will start fetching data from the MongoDB collection and will push to ElasticSearch for indexing. In ElasticSearch, an index named
mongo_log_data
will be created.
Integrate ElasticSearch to BigQuery
Integrate MongoDB to Snowflake
Integrate AWS Elasticsearch to Redshift
Method 3: Using Mongo Connector
- Mongo-Connector is the proprietary tool by MongoDB and a real-time sync system built on Python that allows you to copy the documents from MongoDB to target systems.
- MongoDB connector creates a pipeline from one MongoDB cluster to target systems like Elasticsearch, Solr.
- On startup, it connects MongoDB to target systems and copies the data. Afterward, it regularly checks for the update and performs continuous updates on the target system to keep everything in sync.
- To sync the data to Elasticsearch, MongoDB needs to run in replica-set mode. Once the initial sync is completed, it then tails the Mongo oplog(Operation Log) to keep everything in sync in real-time.
Steps to use Mongo Connector
- Download the Elasticsearch Doc Manager. DocManager is a lightweight, simple-to-write class that defines a limited number of CRUD operations for the target system. To download the Doc Manager for Elasticsearch, follow the guides–
Elastic 1.x doc manager
Elastic 2.x doc manager
- Install the Mongo Connector based on the type of ElasticSearch you’re using. The following metrics will help you decide the correct installation based on the versions.
ElasticSearch Version | Installation Command |
Elasticsearch 1.x | pip install ‘mongo-connector[elastic]’ |
Amazon Elasticsearch 1.x Service | pip install ‘mongo-connector[elastic-aws]’ |
Elasticsearch 2.x | pip install ‘mongo-connector[elastic2]’ |
Amazon Elasticsearch 2.x Service | pip install ‘mongo-connector[elastic2-aws]’ |
Elasticsearch 5.x | pip install ‘mongo-connector[elastic5]’ |
- MongoDB connector uses oplog from MongoDB to replicate the operations, so a replica set must be running before startup. To create one node replica set, execute the below command –
mongod --replSet myDevReplSet
rs.initiate()
- Once the replica set is up and running, you can invoke the connector as –
mongo-connector -m <mongodb server hostname>:<replica set port> -t <replication endpoint URL, e.g. http://localhost:8983/es> -d <name of doc manager, e.g., elasticsearch_doc_manager>
Learn more about MongoDB-Connector Elasticsearch Usage.
Read More About: Connect Elasticsearch to Redshift | Elasticsearch integrates with Oracle Database
Summary
- In this blog post, we have discussed how easily you can connect Elasticsearch and MongoDB for continuous indexing and searching of documents.
- Three different methods to connect Elasticseach and MongoDB have been explained in this article.
- If you’re looking for a more straightforward solution, you can use Hevo Data – a No Code Data pipeline that you can use to build an ETL pipeline in an instant.
Hevo is a cloud-based, completely managed No Code Pipeline ETL tool that offers built-in support for Elasticsearch and MongoDB. It can move data from 150+ Data Sources, including 60+ Free Sources, to most of the common Data Destinations used in the enterprise space.
Want to take Hevo for a spin? Sign up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.
Frequently Asked Questions
1. Is Elasticsearch similar to MongoDB?
Both Elasticsearch and MongoDB are NoSQL databases, but they serve different purposes. MongoDB is a document database, while Elasticsearch is a search and analytics engine optimized for fast text-based queries.
2. Is Elasticsearch a SQL or NoSQL database?
Elasticsearch is a NoSQL database designed for full-text search and analytics on large datasets.
3. Why use Elasticsearch over SQL?
Elasticsearch is preferred over SQL when you need fast, real-time search and analytics, especially for unstructured or text-heavy data, where traditional SQL databases may not perform as well.
Vishal Agarwal is a Data Engineer with 10+ years of experience in the data field. He has designed scalable and efficient data solutions, and his expertise lies in AWS, Azure, Spark, GCP, SQL, Python, and other related technologies. By combining his passion for writing and the knowledge he has acquired over the years, he wishes to help data practitioners solve the day-to-day challenges they face in data engineering. In his article, Vishal applies his analytical thinking and problem-solving approaches to untangle the intricacies of data integration and analysis.
No-code Data Pipeline for your Data Warehouse