In this blog, we will discuss ElasticSearch, MongoDB, and how you can connect MongoDB to ElasticSearch for indexing and searching for massive datasets. Here’s the detailed list that you’ll be covering in this blog.
Hevo offers a faster way to move data from databases or SaaS applications into your data warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.
Get Started with Hevo for Free
Check out some of the cool features of Hevo:
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ sources that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support call.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow to check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!
Download the Guide on How to Set Up a Data Analytics Stack
Learn how to build a self-service data analytics stack for your use case.
Introduction to ElasticSearch
ElasticSearch is an open-source tool designed to index the data and provide a near real-time search. It is a distributed search engine and is capable of indexing Herculean size data. Basic concepts of elastic search are NRT, Cluster, Node, Index, Type, Document, Shards & Replicas.
ElasticSearch can be used as a search and analytics engine for all types of data like numerical, textual, geospatial, unstructured, and structured. ElasticSearch is generally used in a stack known as ELK (ElasticSearch, LogStash, and Kibana) and is known for its speed, scalability, RestAPI, and distributed nature.
Use of ElasticSearch
The distributed nature, speed, scalability, and ability to index any document makes the usage of ElasticSearch almost with everything. It can be used for several use cases like-
- Application search
- Website search
- Enterprise search
- Logging and log analytics
- Infrastructure metrics and container monitoring
- Application performance monitoring
- Geospatial data analysis and visualization
- Security analytics
- Business analytics
Introduction to MongoDB
MongoDB is an open-source NoSQL database that uses a document-oriented data model to store the data and supports NoSQL query language to query the data. MongoDB is widely used among organizations and is one of the most potent NoSQL databases in the market.
NoSQL means it does not use the concept of rows and columns to store the data; instead, it stores the data in the documents and maintains a collection of documents. The data stored in the document consists of a set of key-value pairs and allows it to scale vertically and stores them into a storage format known as BSON (Binary Style of JSON document).
MongoDB allows you to modify the schemas without any downtime. It is highly elastic that lets you combine and store data of multivariate types without compromising on the powerful indexing options, data access, and validation rules.
Integrate ElasticSearch and MongoDB
MongoDB is used for storage, and ElasticSearch is used to perform full-text indexing over the data. Hence, the combination of MongoDB for storing and ElasticSearch for indexing is a common architecture that many organizations follow.
You can use various tools to replicate the data from MongoDB to ElasticSearch for indexing. Let’s look at some of the top plugins or tools to copy or synchronize data from MongoDB to ElasticSearch.
MongoDB River Plugin
ElasticSearch-River-MongoDB is a plugin used to synchronize the data between ElasticSearch and MongoDB.
In MongoDB, whenever the document is inserted into the database, the schema is updated and all the operations like Insert, Update, Delete are stored in Operation Log (oplog) collection as a rolling record. River plugin monitors the oplog collection and syncs them with ElasticSearch based on the configuration automatically. Once the data is synced with ElasticSearch, the indexes are updated automatically within ElasticSearch.
For the source code of River Plugin, you can refer to the GitHub link here – elasticsearch-river-mongodb
Steps to use Mongo River Connector
This plugin requires MongoDB as the source and ElasticSearch as the target to migrate and sync data between these two sources.
- To install the plugin, execute the below command at the MongoDB installation location –
bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.9
- Check the compatibility of the connector with the ElasticSearch version here.
- Create the indexing river with the below curl syntax –
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "DATABASE_NAME",
"collection": "COLLECTION",
"gridfs": true
},
"index": {
"name": "ES_INDEX_NAME",
"type": "ES_TYPE_NAME"
}
}'
Example –
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "testmongo",
"collection": "person"
},
"index": {
"name": "mongoindex",
"type": "person"
}
}'
- To view indexed data in ElasticSearch, use the following command –
http://localhost:9200/mongoindex/person/_search?preety
To know more about MongoDB-River Plugin usage, you can check the Github repo here.
LogStash
LogStash is an open-source tool from the ELK stack and is used to unify the data from multiple sources and then normalizes the data on the destinations. LogStash inputs the data from the source, modifies them using filters, and then outputs it to the destination.
As LogStash is the tool from the ELK stack, it has excellent capabilities to connect with ElasticSearch, you can use LogStash to take input from MongoDB by using JDBC connector, and output to ElasticSearch. With the help of filters, you can modify the data in transit to ES(if required).
To get started with LogStash and to read more about LogStash, you can look here.
Steps to use LogStash
- To connect ElasticSearch and MongoDB via LogStash, you need the “logstash-input-mongodb” input plugin.
- Navigate to the LogStash Installation directory and perform the following commands –
cd /usr/share/logstash
bin/logstash-plugin install logstash-input-mongodb
- Once the installation is successful, you need to create a configuration file to take MongoDB as input and ElasticSearch as an output. A sample configuration file will look like as shown below –
input {
uri => 'mongodb://username:password@xxxx-00-00-nxxxn.mongodb.net:27017/xxxx?ssl=true'
placeholder_db_dir => '/opt/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'users'
batch_size => 5000
}
filter {
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
action => "index"
index => "mongo_log_data"
hosts => ["localhost:9200"]
}
}
- Once the configuration file is successfully set up, you can execute the below command to start the pipeline.
bin/logstash -f /etc/logstash/conf.d/mongodata.conf
- The above command will start fetching data from the MongoDB collection and will push to ElasticSearch for indexing. In ElasticSearch, an index named “mongo_log_data” will be created.
Mongo Connector
Mongo-Connector is the proprietary tool by MongoDB and a real-time sync system built on Python that allows you to copy the documents from MongoDB to target systems.
MongoDB connector creates a pipeline from one MongoDB cluster to target systems like ElasticSearch, Solr. On startup, it connects MongoDB to target systems and copies the data. Afterward, it regularly checks for the update and performs continuous updates on the target system to keep everything in sync.
To sync the data to ElasticSearch, MongoDB needs to run in replica-set mode. Once the initial sync is completed, it then tails the Mongo oplog(Operation Log) to keep everything in sync in real-time.
To know more about Mongo Connector, you can look at the official page here – mongo-connector
Steps to use Mongo Connector
- Download the ElasticSearch Doc Manger. DocManager is a lightweight, and simple to write class that defines a limited number of CRUD operations for the target system. To download the Doc Manager for ElasticSearch, follow the guide here –
Elastic 1.x doc manager: https://github.com/mongodb-labs/elastic-doc-manager
Elastic 2.x doc manager: https://github.com/mongodb-labs/elastic2-doc-manager
- Install the Mongo Connector based on the type of ElasticSearch you’re using. Following metrics will help you to decide the correct installation based on the versions –
ElasticSearch Version | Installation Command |
Elasticsearch 1.x | pip install ‘mongo-connector[elastic]’ |
Amazon Elasticsearch 1.x Service | pip install ‘mongo-connector[elastic-aws]’ |
Elasticsearch 2.x | pip install ‘mongo-connector[elastic2]’ |
Amazon Elasticsearch 2.x Service | pip install ‘mongo-connector[elastic2-aws]’ |
Elasticsearch 5.x | pip install ‘mongo-connector[elastic5]’ |
- MongoDB connector uses oplog from MongoDB to replicate the operations, so a replica set must be running before startup. To create one node replica set, execute the below command –
mongod --replSet myDevReplSet
rs.initiate()
- Once the replica set is up and running, you can invoke the connector as –
mongo-connector -m <mongodb server hostname>:<replica set port> -t <replication endpoint URL, e.g. http://localhost:8983/es> -d <name of doc manager, e.g., elasticsearch_doc_manager>
To know more about MongoDB-Connector ElasticSearch usage, follow the Github guide here.
Conclusion
In this blog post, we have discussed how easily you can connect ElasticSearch and MongoDB for continuous indexing and searching of the documents. However, if you’re looking for a more straightforward solution, you can use Hevo Data – a No Code Data pipeline that you can use to build an ETL pipeline in an instant.
Visit our Website to Explore Hevo
Hevo integrates with 100+ sources, including SaaS applications, databases, BI tools, etc.
Want to take Hevo for a spin?
Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also look at the unbeatable pricing that will help you choose the right plan for your business needs.