In partnership with Elastic, we have made it easier than ever to connect always-on, always-moving data flowing through Confluent to Elastic’s real-time search and analytics database. With our seamless integrations, organizations can set their data in motion for a new generation of use cases that were not possible before.

– Jay Kreps, co-founder, and CEO, Confluent

Confluent and Elastic are thrilled to collaborate to make Apache Kafka Elasticsearch integration easier than ever. This allows enterprises to stream data from Kafka into Elasticsearch in real-time, enabling log analysis, full-text search, and more.

The Elasticsearch Service Sink Connector in Confluent Cloud eliminates the need for clients to manage their own Kafka clusters.  This allows businesses to stream data from Kafka to Elasticsearch across all major cloud providers, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud.

Prerequisites

To successfully set up your Kafka Elasticsearch Service Sink Connector ensure that you meet the following requirements:

  • Confluent Platform 3.3.0 or above.
  • A Kafka cluster on the Confluent Cloud. If you haven’t set it up, then refer to this Quick Start Guide.
  • Authorized access to a Confluent Cloud cluster.
  • Java version 1.8.
  • Elasticsearch version 7.x.
  • The Elasticsearch Service and Confluent Cloud should be set to the same region.

Introduction to Elasticsearch

Kafka Elasticsearch - Elasticsearch Logo
Image Source

Elasticsearch is an Apache Lucene-based distributed Search and Analytics engine. It has swiftly become the most popular search engine since its introduction in 2010. It is frequently used for Log Analytics, full-text Search, Security Intelligence, Business Analytics, and a lot of other applications.

Elasticsearch is a NoSQL databaseopen-source, and written in Java. It is available for both On-Premises and Cloud. You can also integrate it with BI tools such as Power BI, Looker, and Tableau. Elasticsearch makes use of an Inverted Index data structure, which allows it to execute extremely quick real-time full-text searches. 

The primary components of the Elastic stack, known as ELK, are Elasticsearch, Logstash, and Kibana is widely used in the industry.

Key Features of Elasticsearch

Elasticsearch’s distributed nature, speed, scalability, and ability to index any document make it suitable for practically any application. Let’s explore these features a bit more:

  • Seamless and Fast: Elasticsearch employs schema-free JSON documents, supports simple REST-based APIs, and uses a basic HTTP interface, making it simple to get started and quickly create applications for a range of use-cases.
  • High Performance: Elasticsearch’s distributed nature allows it to analyze massive amounts of data in parallel, allowing it to quickly identify the best matches for your searches.
  • Automatic Node Recovery: If a Node departs the Elasticsearch Cluster for any reason, such as Node failure, the Master Node takes the required steps to manage the load by replacing the Node with its replica and rebalancing all Shards.

Introduction to Kafka

Kafka Elasticsearch - Kafka Logo
Image Source

Apache Kafka is a distributed Event Streaming platform that enables applications to handle massive volumes of data in a short amount of time. Its fault-tolerant, highly scalable architecture is capable of managing billions of events.

Kafka’s ability to withstand peak data input volumes is a distinct and strong advantage. It can simply and quickly scale up and down with minimal downtime. Kafka’s popularity among other Data Streaming technologies has grown as a result of its minimal data redundancy and fault tolerance.

Key Features of Kafka

Apache Kafka is incredibly popular because of its features, which include ensuring uptime, making scaling simple, and allowing it to manage massive volumes. Let’s have a look at some of the powerful features it provides:

  • High Scalability: The partitioned log model used by Kafka distributes data over several servers, allowing it to extend beyond the capacity of a single server.
  • Extensibility: Since Kafka’s surge in popularity in recent years, many other applications have built connectors. This enables the installation of extra features, such as integration with other systems, in a matter of seconds. Check out how you can integrate Kafka with Amazon Redshift and Salesforce.
  • Metrics and Monitoring: Kafka is a popular tool for tracking operational data. This requires gathering data from several apps and consolidating it into centralized feeds with metrics. To read more about how you can analyze your data in Kafka, refer to Real-time Reporting with Kafka Analytics

Steps to Set Up ElasticSearch Kafka Connection

Kafka Elasticsearch - Elasticsearch Service Sink Connector
Image Source

The Kafka Elasticsearch Service Sink Connector for Confluent Cloud helps you to seamlessly move your data from Kafka to Elasticsearch. It supports many data outputs from the Kafka topics such as Avro, JSON Schema, Protobuf, or JSON-schemaless. It publishes data to an Elasticsearch index from a Kafka topic. 

In this section, you will learn the key steps to configure your Service Sink Connector using Confluent Cloud Console. So, get started with adding a connector and configuring it to stream events to an Elasticsearch deployment by following the steps listed below.

Step 1: Add the Kafka Elasticsearch Service Sink Connector

The first step is to add the connector to the Confluent Cloud. Follow the steps below:

  • Launch your Kafka cluster you created on Confluent Cloud.
  • Now navigate to “Data Integration → Connectors” and click on “+ Add Connector”.
  • Next, click on the “Elasticsearch Service Sink” connector as shown below.
Kafka Elasticsearch - Elasticsearch Sink Connector
Image Source

Step 2: Set Up Kafka Elasticsearch Connection

After adding the Kafka Elasticsearch Service Sink Connector, follow the steps below to set up the connection.

  • Firstly choose the Kafka Topics and enter a name for the connector.
  • Next, choose one of the options provided to enter your Kafka Cluster credentials. You can provide these either through service account resource ID or enter the API key and secret.
  • Now, select the input message format and provide the correct Elasticsearch connection details. You will be required to enter your connection URI, Elasticsearch deployment credentials, and other details. 
  • After entering all the required information, enter the number of tasks you can create.

Step 3: Launch the Kafka Elasticsearch Service Sink Connector

Now that you have configured your Connection, you can now launch the Service Sink Connector. Before you launch it, make sure that to verify all your connection details carefully. Click “Launch” once you have verified the details. An example of connection details can be seen below.

Kafka Elasticsearch - Connection Details
Image Source

Step 4: Check Kafka Elasticsearch Connection Status

Before you move your data from Kafka cluster to Elasticsearch, make sure you check that the Kafka Elasticsearch Connection is synced properly. The status of the successful connection will change from Provisioning to Running as seen below.

Kafka Elasticsearch - Connection Status
Image Source

Good work! You have successfully configured your Service Sink Connector in Confluent Cloud using the Confluent Cloud Console. Now you can move your Kafka data to Elasticsearch easily.

If you wish to set up your Kafka Elasticsearch Service Sink Connector in Confluent Cloud using Confluent CLI, you can refer to the following Documentation.

Key Features of the Elasticsearch Kafka Connector

The Kafka Elasticsearch Sink Connector makes it easy to link Apache Kafka and Elasticsearch. You can stream data from Kafka into Elasticsearch to perform Log Analysis or full-text Search. You can also benefit from being able to do real-time Analytics or integrate it with other platforms like Kibana.

1) Easy Mapping Management

Connect schemas allow the Kafka Elasticsearch Connector to infer mappings. When the connection is activated, it generates mappings based on Kafka message schemas.

The inference is confined to field types and default values if a field is absent. If you require further adjustments, you need to manually construct mappings.

2) Seamless Schema Management

The Kafka Elasticsearch Connector can handle backward, forward, and other compatible schema updates in Connect. It also facilitates schema evolution.

When Elasticsearch discovers a previously unknown field in a document, it implements dynamic mapping to identify the data type for the field and adds the new field to the type mapping automatically.

3) Effective Multitasking

To increase throughput, the Elasticsearch Connector enables batching and pipelined writes to Elasticsearch. It collects messages in batches and allows many batches to be processed simultaneously. The number of tasks in the tasks can be specified.

The maximum configuration option. Hence, when numerous files need to be processed, this can result in significant performance benefits.

4) Supports Exactly Once Delivery

Elasticsearch’s idempotent write semantics ensures that the Kafka Elasticsearch Connector performs exactly-once delivery to Elasticsearch. The connector guarantees precise delivery by specifying IDs in Elasticsearch documents.

If keys are provided in Kafka messages, they are immediately converted to Elasticsearch document IDs. However, if the keys aren’t specified or are explicitly ignored, the connector uses topic+partition+offset as the key, guaranteeing that each message in Kafka corresponds to precisely one document in Elasticsearch.

5) Prevents Thundering Herd

If the Elasticsearch service is inadvertently overloaded, the Kafka Elasticsearch connector may be unable to write to the Elasticsearch endpoint.

The connection will retry the request several times before failing and may employ an exponential backoff mechanism to give the Elasticsearch service time to recover, thus, preventing additional overloading.

This approach adds jitter to the computed backoff timings to prevent a thundering herd, which occurs when a high number of requests from several jobs are made at the same time, thereby, overloading the service.

6) Secure Connection

By configuring the Kafka Elasticsearch connection, the Elasticsearch connector can write data to a secure Elasticsearch cluster that supports basic authentication. By following the steps in the Kafka Connect Security guidelines, you can securely streamline your data from Kafka to Elasticsearch.

Limitations of Elasticsearch Kafka Connection

Kafka Elasticsearch Service Sink Connector in Confluent Cloud was introduced in the previous year. Hence, the Kafka and Confluent teams are working to improve them. However, this connector does have a few limitations. These are listed below:

  • The connector is only compatible with Elastic Cloud’s Elasticsearch Service.
  • Elasticsearch version 7.1 and later are supported by the connector. It does not support Elasticsearch version 8.x.
  • The target Elasticsearch deployment and the Confluent Cloud cluster must be in the same region.
  • Single Message Transformations (SMTs) that change the topic name are presently not supported by the Kafka Elasticsearch Sink Connector. SMTs alter incoming messages after they’ve been produced by a source connection but before they’re published to Kafka. Outbound messages are transformed by SMTs before being transmitted to a sink connection. 
  • The following transformations are also not allowed by the Kafka Elasticsearch Connector:
    • org.apache.kafka.connect.transforms.TimestampRouter
    • io.confluent.connect.transforms.MessageTimestampRouter
    • io.confluent.connect.transforms.ExtractTopic$Key
    • io.confluent.connect.transforms.ExtractTopic$Value

Set up Reliable Data Pipeline in Minutes and Experience Hevo 14 days for no cost, Create Your Free Account

Conclusion

This article helped you gain knowledge about Elasticsearch and Kafka. You explored the key features of these services and also learned the capabilities of Kafka Elasticsearch Service Sink Connector.

Moreover, you understood the key steps to configure your Elasticsearch Connector in Confluent Cloud.

You can also use the Elasticsearch Connector for your Confluent On-Premise platform. You can refer to Elasticsearch Service Sink Connector for Confluent Platform documentation.

Shubhnoor Gill
Research Analyst, Hevo Data

Shubhnoor is a data analyst with a proven track record of translating data insights into actionable marketing strategies. She leverages her expertise in market research and product development, honed through experience across diverse industries and at Hevo Data. Currently pursuing a Master of Management in Artificial Intelligence, Shubhnoor is a dedicated learner who stays at the forefront of data-driven marketing trends. Her data-backed content empowers readers to make informed decisions and achieve real-world results.

No-Code Data Pipeline For Your Data Warehouse