Data export and analytics are on the rise for all brands intending to analyze structured and unstructured data to optimize and fine-tune their business strategies. Elasticsearch is one of the many platforms that are gaining popularity in the analytics domain.
This article deals with methods to export data from Elasticsearch and uses in tandem with platforms, along with detailed code snippets to implement the same. It also throws some light on the challenges of manual scripting for data export along with possible solutions to overcome them.
List of Contents
What is Elasticsearch?
Elasticsearch is an open-source search and analytics engine that has a robust REST API, a distributed nature, and ample speed and scalability for use, across multiple platforms. It allows the analytics of textual, numerical, and even geospatial data that can be employed for any intended use.
It also facilitates data ingestion, storage, analysis, enrichment, and visualization in the most comprehensive forms. With bulk datasets, the issue of data export, which is reliable and fast becomes a valid concern.
Why Export Elasticsearch Data?
While data in Elasticsearch can be manipulated and enhanced to derive beneficial insights, these metrics often need to be transmitted to other platforms that a company is using for storage or other data manipulation purposes.
Thus, data integrations and export from Elasticsearch is a phenomenon that is widely employed by companies to get the data in their desired storage or warehouse locations.
Hevo is a No-Code Data Pipeline. It supports pre-built data integrations from 100+ data sources, including Elasticsearch.
Get Started with Hevo for Free
Hevo offers a fully managed solution for your data migration process. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data at your data warehouse.
Let’s look at some salient features of Hevo:
Sign up here for a 14-Day Free Trial!
- Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
- Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
- Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within pipelines.
- Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Download the Ultimate Guide on Database Replication
Learn the 3 ways to replicate databases & which one you should prefer.
Elasticsearch Export Methods
There are several ways in which you can export data from Elasticsearch, depending upon the type of file you want to export. Whether you want to export your data as a CSV file, a table in HTML, or a JSON file, it depends on the use case of your company.
Here are three popular methods, you use to export files from Elasticsearch to any desired warehouse or platform of your choice:
1. Export Elasticsearch Data: Using Logstash-Input-Elasticsearch Plugin
There are several plug-ins that you can use for rapid data export from Elasticsearch. If you are looking for data export as CSV files, this method can be useful.
Run the following script to install the Logstash-Input-Elasticsearch Plugin:
# cd /opt/logstash/
# bin/plugin install logstash-input-elasticsearch
The use of bin/plugin is deprecated and will be removed
in a feature release. Please use bin/logstash-plugin.
After this you will also require to install the Logstash-Output-CSV plugin to receive the outputs in the desired format:
# cd /opt/logstash
# bin/logstash-plugin install logstash-output-csv
Now, you can write your queries in the input section to return the JSON values as a CSV output file. The output plugin will pick out specific fields that are configured under logstash to save CSV files directly as a result of the query.
This configuration can be seen in the following script:
hosts => "localhost:9200"
index => "index-we-are-reading-froml"
query => '
#Insert your Elasticsearch query here
# This is the field that you would like to output
in CSV format.
#The field needs to be one of the fields shown in the output
when you run your
# Elasticsearch query
fields => ["field1", "field2", "field3",
# This is where we store output.
We can use several files to store our output
# by using a timestamp to determine
the filename where to store output.
path => "/tmp/csv-export.csv"
2. Elasticsearch Export: Using Elasticsearch Dump
Another way to effectively initiate an export in the desired format is by using the Elasticsearch dump. Elasticdump alters the format of dump files into the desired format which can then be scanned and recorded in the specified location.
Use the following commands to install Elasticdump:
npm install elasticdump
npm install elasticdump -g
After installation, you can specify input and output for the process, each of which can either be URLs or files. Data export using elasticdump is shown below:
# Export ES data to S3 (using s3urls)
# Export ES data to MINIO (s3 compatible) (using s3urls)
You can also use MultiElasticDump to run a similar script for multiple indices at once:
# backup ES indices & all their type to the es_backup folder
# Only backup ES indices ending with a prefix of `-index` (match regex).
# Only the indices data will be backed up. All other types are ignored.
# NB: analyzer & alias types are ignored by default
3. Elasticsearch Export: Using Python Pandas
Python Pandas can be used to export documents in various formats. You can use Elasticsearch Pandas to export files in HTML, CSV or JSON formats.
Install Python 3’s PIP using the following command:
sudo apt install python3-pip
sudo yum install python36
sudo yum install python36-devel
sudo yum install python36-setuptools
sudo easy_install-3.6 pip
The complete script for exporting and initiating a data export in any of these forms using Python’s Pandas is as follows:
#-*- coding: utf-8 -*-
import sys, time, io
start_time = time.time()
if sys.version != "3":
print ("nThis script requires Python 3")
print ("Please run the script using the 'python3' command.n")
# import the Elasticsearch low-level client library
from elasticsearch import Elasticsearch
# import Pandas, JSON, and the NumPy library
import pandas, json
import numpy as np
except ImportError as error:
print ("nImportError:", error)
print ("Please use 'pip3' to install the necessary packages.")
# create a client instance of the library
print ("ncreating client instance of Elasticsearch")
elastic_client = Elasticsearch()
MAKE API CALL TO CLUSTER AND CONVERT
THE RESPONSE OBJECT TO A LIST OF
# total num of Elasticsearch documents to get with API call
total_docs = 20
print ("nmaking API call to Elasticsearch for", total_docs, "documents.")
response = elastic_client.search(
# grab list of docs from nested dictionary response
print ("putting documents in a list")
elastic_docs = response["hits"]["hits"]
GET ALL OF THE ELASTICSEARCH
INDEX'S FIELDS FROM _SOURCE
# create an empty Pandas DataFrame object for docs
docs = pandas.DataFrame()
# iterate each Elasticsearch doc in list
print ("ncreating objects from Elasticsearch data.")
for num, doc in enumerate(elastic_docs):
# get _source data dict from document
source_data = doc["_source"]
# get _id from document
_id = doc["_id"]
# create a Series object from doc dict object
doc_data = pandas.Series(source_data, name = _id)
# append the Series object to the DataFrame object
docs = docs.append(doc_data)
EXPORT THE ELASTICSEARCH DOCUMENTS PUT INTO
print ("nexporting Pandas objects to different file types.")
# export the Elasticsearch documents as a JSON file
# have Pandas return a JSON string of the documents
json_export = docs.to_json() # return JSON data
print ("nJSON data:", json_export)
# export Elasticsearch documents to a CSV file
docs.to_csv("objectrocket.csv", ",") # CSV delimited by commas
# export Elasticsearch documents to CSV
csv_export = docs.to_csv(sep=",") # CSV delimited by commas
print ("nCSV data:", csv_export)
# create IO HTML string
html_str = io.StringIO()
# export as HTML
# print out the HTML table
# save the Elasticsearch documents as an HTML table
print ("nntime elapsed:", time.time()-start_time)
Code Snippets from Qbox, GitHub and ObjectRocket
Limitations of Manual Export
While the above-listed methods for a manual export are easy to implement and follow-through, there are known limitations to this manual export process. Firstly, in-depth technical knowledge is required to be able to install the required plug-ins and implement these scripts. Although many professionals might find this task achievable, non-tech sectors that desire to export their data might not readily be able to do so.
Secondly, any manual integration holds the possibility of errors and inconsistent or invalid code that can cause problems in the general process. If implemented incorrectly, there is also a possibility to lose data and corrupt important files. These issues can be resolved with the use of platforms that are not based on code implementation and can automate the export process.
In this blog, you learned about Elasticsearch export using three different methods. You can select your desired method to export Elasticsearch data for robust use across multiple platforms. Running scripts via selected plug-ins can help carry out this task, while other methods will automate the process for you. Choose your desired channel to reap the benefits of Elasticsearch in tandem with your data warehouse. But, if you are looking for an automated solution to export data from Elasticsearch to a data warehouse, then try Hevo.
Hevo is a No-Code Data Pipeline. It supports pre-built integrations from 100+ data sources at a reasonable price. With Hevo, you can migrate your Elasticsearch data to your data warehouse in minutes.
Visit our Website to Explore Hevo
Sign Up for a 14 day free trial.
Share your experience of exporting Elasticsearch data in the comment section below.