Do you want to index your data across categories? Do you want to replicate the indices for speed and performance? Then you have landed in the right place. This article will answer all your queries & guide you with a step-by-step approach.
Upon a complete walkthrough of the content, you will be able to successfully create Elasticsearch replication of the index across the Elasticsearch data centers to improve performance and disaster management. Here’s the detailed list of what you’ll be covering in this blog.
Table of Contents
What is Elasticsearch?
Image Source: Wikipedia
Elasticsearch is an open-source tool designed to index the data and provide a near real-time search. It is a distributed search engine and is capable of indexing Herculean size data. Basic concepts of Elasticsearch are NRT, Cluster, Node, Index, Type, Document, Shards & Replicas.
Elasticsearch can be used as a search and analytics engine for all types of data like numerical, textual, geospatial, unstructured, and structured. Elasticsearch is generally used in a stack known as ELK (Elasticsearch, LogStash, and Kibana) and is known for its speed, scalability, RestAPI, and distributed nature.
Use of Elasticsearch
The distributed nature, speed, scalability, and ability to index any document make the usage of Elasticsearch almost with everything. It can be used for several use cases like-
- Application search
- Website search
- Enterprise search
- Logging and log analytics
- Infrastructure metrics and container monitoring
- Application performance monitoring
- Geospatial data analysis and visualization
- Security analytics
- Business analytics
What is Cross Cluster Elasticsearch Replication?
Cross Cluster replication is a replication method adopted in Elasticsearch that replicates the data across the data centers. The benefits of cross-cluster Replication (CCR) are –
- It brings data closer to the users or application server to reduce latency and response time.
- It provides tolerance to mission-critical applications to withstand data center or region outages
The various use-cases for Cross Cluster Replication are –
- Disaster Recovery: Replicating the data across the cluster helps users to recover the data in case of any failovers. Data will be replicated across the data centers and can be restored in case of failovers.
- High Availability: Multiple copies of the data across the cluster will ensure that you have at least one copy of data available at any given point in time. This provides high availability of data in case any nodes are down.
- Data Locality: In replication, the data gets replicated closer to the user or application server. This data locality helps to reduce latency and faster processing which in turn reduces the overall costs. For example, you can replicate a clothing catalog across 100 data centers around the world to minimize the distance between the data and the application server.
Hevo offers a faster way to move data from databases or SaaS applications into your data warehouse to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.
GET STARTED WITH HEVO FOR FREE[/hevoButton]
Check out some of the cool features of Hevo:
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ sources that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
SIGN UP HERE FOR A 14-DAY FREE TRIAL
Set up Cross Cluster Elasticsearch Replication
In this section, we will discuss how you can set up cross-cluster elasticsearch replication within two data centers. Before moving to the actual steps, have a look at the pre-requisites –
- Up and running two licensed clusters to perform cross-cluster elasticsearch replication.
- Necessary privileges like read_ccr cluster and monitoring the read privileges for the leader index on the remote cluster and local cluster.
- Indexed data on the remote cluster that needs to be replicated.
Now that you have the prerequisites ready, let’s deep dive into a step by step approach, to set up cross-cluster elasticsearch replication –
Step 1: Connect to Remote Cluster
In this use case, we will be replicating the indices from the remote cluster (leader) to the local cluster (follower). To replicate an index on a remote cluster (Cluster A) to a local cluster (Cluster B), you need to configure Cluster A as a remote on Cluster B.
Image Source: elastic.co
Use the Stack Management dashboard from Kibana to configure the clusters. To configure a remote cluster from Stack Management in Kibana:
- From the side navigation, select Remote Clusters.
- Specify the IP address or hostname of the remote cluster (ClusterB), followed by the transport port of the remote cluster (defaults to 9300). For example, 192.168.1.1:9300.
Image Source: elastic.co
Step 2: Enable Soft Deletes on Leader Indices
To enable the replication and to follow an index, soft deletes must be enabled while creating the indexes. If you don’t have soft delete features enabled, you need to reindex it and use the new index as the leader index. With Elasticsearch 7.0 and later, Soft deletes are enabled by default.
Step 3: Create a Follower Index to Replicate the Index
The follower index will follow the leader index. To create the follower index, head out to the Stack Management dashboard in Kibana.
- From the side navigation, select Cross-Cluster Replication and choose the Follower Indices tab.
- Select the leader index cluster that you want to replicate.
- Provide the name of the leader index and also add the follower index.
Image Source: elastic.co
Once you provide the name of the index, Elasticsearch initializes the recovery process that transfers the existing Lucene segment files from the leader index to the follower index. During the process, the status is changed to Paused, and it becomes Active once the process gets completed.
After successful replication, when you index the document in the leader index, Elasticsearch will automatically index the document in the follower index.
Image Source: elastic.co
Step 4: Create an Auto-follow Pattern to Replicate Time-series Indices
For the time-series indices, you can use the auto-follow pattern to create new followers. Whenever a new index matches the auto-follower patterns, a corresponding follower index is added to the local cluster.
An auto-follow pattern needs the information about the remote cluster that you want to replicate, and one or more index patterns to replicate the time-series indices.
To create an auto-follow pattern, head out to Stack Management Dashboard in Kibana:
- From the side Navigation, select Cross Cluster Replication and select the Auto-follow patterns tab.
- Provide the name for the auto-follow pattern.
- Select the remote cluster that contains the index.
- Provide one or more index patterns to identify the indices you want to replicate from the remote cluster.
For example, enter sparkLog-*, sysLog-* to automatically create followers for Spark Log and System Log indices
- Use follower- as the prefix for follower indices to easily identify replicated indices.
Once the setup is done, whenever new indices matching these patterns are created on the remote, Elasticsearch automatically replicates them to local follower indices.
Image Source: elastic.co
Benefits of Cross Cluster Elasticsearch replication
With cross-cluster elasticsearch replication, you can replicate indices across clusters to:
- Prevent search volume from impacting indexing throughput
- Reduce search latency by processing search requests in geo-proximity to the user
- Continue handling search requests in the event of a datacenter outage
Conclusion
In this blog post, we have discussed how easily you can manage the Elasticsearch replication of the indices using cross-cluster replication. This replication not only helps you to speed up the performance but also helps you in making your system fault-tolerant. However, if you’re looking for a more straightforward solution, you can use Hevo Data – a No Code Data pipeline that you can use to build an ETL pipeline in an instant.
Visit our Website to Explore Hevo
Hevo Data provides its users with a simpler platform for integrating data from 100+ sources for Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouse, Database, or a destination of your choice. It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have Analysis-ready data in your desired destination.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!
Share your thoughts on Elasticsearch replication in the comments below!