Data export and analytics are on the rise for all brands intending to analyze structured and unstructured data to optimize and fine-tune their business strategies. Elasticsearch is one of the many platforms that are gaining popularity in the analytics domain.
This article deals with methods to perform Elasticsearch export and uses in tandem with platforms, along with detailed code snippets to implement the same. It also throws some light on the challenges of manual scripting for data export along with possible solutions to overcome them.
What is Elasticsearch?
Elasticsearch is an open-source search and analytics engine that has a robust REST API, a distributed nature, and ample speed and scalability for use, across multiple platforms. It allows the analytics of textual, numerical, and even geospatial data that can be employed for any intended use.
It also facilitates the most comprehensive forms of data ingestion, storage, analysis, enrichment, and visualization. With bulk datasets, the issue of reliable and fast data export becomes a valid concern.
Key Features of Elasticsearch
- Distributed Architecture: Scales horizontally by distributing data across multiple nodes.
- Real-Time Search and Analytics: Provides near real-time search and analytics capabilities.
- Full-Text Search: Offers powerful full-text search, including natural language processing.
- RESTful API: Interacts with Elasticsearch via HTTP using JSON over a RESTful API.
- Document-Oriented: Stores data in JSON format, making it easy to index and query.
Why Export Elasticsearch Data?
Exporting data from Elasticsearch is vital because of various reasons, such as:
- Compliance with Data Governance Policies
- Data Backup and Recovery
- Data Migration
- Data Analysis and Reporting
- Cross Platform Integrations
Trusted by 2000+ customers across 40+ countries, Hevo elevates your data migration game with its no-code platform. Ensure seamless data migration using features like:
- Seamless integration with sources like Elasticsearch and 150 others.
- Transform and map data easily with drag-and-drop features.
- Real-time data migration to leverage AI/ML features of BigQuery and Synapse.
Still not sure? See how Postman, the world’s leading API platform, used Hevo to save 30-40 hours of developer efforts monthly and found a one-stop solution for all its data integration needs.
Get Started with Hevo for Free
Elasticsearch Export Methods
There are several ways in which you can export data from Elasticsearch, depending upon the type of file you want to export. Whether you want to export your data as a CSV file, a table in HTML, or a JSON file, it depends on the use case of your company.
Here are three popular methods, you use to export files from Elasticsearch to any desired warehouse or platform of your choice:
1. The Easiest and Fastest Method to Perform Elasticsearch Export- Using Hevo
Step 1: Configure Elasticsearch as your Source.
Step 2: Configure your desired Destination.
You can choose your desired destination from the list of data warehouses and databases provided by Hevo to export your Elasticsearch Data.
2. Export Elasticsearch Data: Using the Logstash-Input-Elasticsearch Plugin
There are several plug-ins that you can use for rapid data export from Elasticsearch. If you are looking for data export as CSV files, this method can be useful.
Run the following script to install the Logstash-Input-Elasticsearch Plugin:
# cd /opt/logstash/
# bin/plugin install logstash-input-elasticsearch
The use of bin/plugin is deprecated and will be removed
in a feature release. Please use bin/logstash-plugin.
Validating logstash-input-elasticsearch
Installing logstash-input-elasticsearch
Installation successful
After this you will also require to install the Logstash-Output-CSV plugin to receive the outputs in the desired format:
# cd /opt/logstash
# bin/logstash-plugin install logstash-output-csv
Validating logstash-output-csv
Installing logstash-output-csv
Installation successful
Now, you can write your queries in the input section to return the JSON values as a CSV output file. The output plugin will pick out specific fields that are configured under logstash to save CSV files directly as a result of the query.
This configuration can be seen in the following script:
input {
elasticsearch {
hosts => "localhost:9200"
index => "index-we-are-reading-froml"
query => '
{"query": {
..
#Insert your Elasticsearch query here
}
}
}
}}'
}
}
output {
csv {
# This is the field that you would like to output
in CSV format.
#The field needs to be one of the fields shown in the output
when you run your
# Elasticsearch query
fields => ["field1", "field2", "field3",
"field4","field5"]
# This is where we store output.
We can use several files to store our output
# by using a timestamp to determine
the filename where to store output.
path => "/tmp/csv-export.csv"
}
}
Export ElasticSearch to Snowflake
Export ElasticSearch to BigQuery
Export ElasticSearch to Redshift
3. Elasticsearch Export: Using Elasticsearch Dump
Another way to effectively initiate an export in the desired format is by using the Elasticsearch dump. Elasticdump alters the format of dump files into the desired format which can then be scanned and recorded in the specified location.
Use the following commands to install Elasticdump:
npm install elasticdump
./bin/elasticdump
OR
npm install elasticdump -g
elasticdump
After installation, you can specify input and output for the process, each of which can either be URLs or files. Data export using elasticdump is shown below:
# Export ES data to S3 (using s3urls)
elasticdump
--s3AccessKeyId "${access_key_id}"
--s3SecretAccessKey "${access_key_secret}"
--input=http://production.es.com:9200/my_index
--output "s3://${bucket_name}/${file_name}.json"
# Export ES data to MINIO (s3 compatible) (using s3urls)
elasticdump
--s3AccessKeyId "${access_key_id}"
--s3SecretAccessKey "${access_key_secret}"
--input=http://production.es.com:9200/my_index
--output "s3://${bucket_name}/${file_name}.json"
--s3ForcePathStyle true
--s3Endpoint https://production.minio.co
You can also use MultiElasticDump to run a similar script for multiple indices at once:
# backup ES indices & all their type to the es_backup folder
multielasticdump
--direction=dump
--match='^.*$'
--input=http://production.es.com:9200
--output=/tmp/es_backup
# Only backup ES indices ending with a prefix of `-index` (match regex).
# Only the indices data will be backed up. All other types are ignored.
# NB: analyzer & alias types are ignored by default
multielasticdump
--direction=dump
--match='^.*-index$'
--input=http://production.es.com:9200
--ignoreType='mapping,settings,template'
--output=/tmp/es_backup
4. Elasticsearch Export: Using Python Pandas
Python Pandas can be used to export documents in various formats. You can use Elasticsearch Pandas to export files in HTML, CSV or JSON formats.
Install Python 3’s PIP using the following command:
sudo apt install python3-pip
sudo yum install python36
sudo yum install python36-devel
sudo yum install python36-setuptools
sudo easy_install-3.6 pip
The complete script for exporting and initiating a data export in any of these forms using Python’s Pandas is as follows:
#!/usr/bin/env python3
#-*- coding: utf-8 -*-
import sys, time, io
start_time = time.time()
if sys.version[0] != "3":
print ("nThis script requires Python 3")
print ("Please run the script using the 'python3' command.n")
quit()
try:
# import the Elasticsearch low-level client library
from elasticsearch import Elasticsearch
# import Pandas, JSON, and the NumPy library
import pandas, json
import numpy as np
except ImportError as error:
print ("nImportError:", error)
print ("Please use 'pip3' to install the necessary packages.")
quit()
# create a client instance of the library
print ("ncreating client instance of Elasticsearch")
elastic_client = Elasticsearch()
"""
MAKE API CALL TO CLUSTER AND CONVERT
THE RESPONSE OBJECT TO A LIST OF
ELASTICSEARCH DOCUMENTS
"""
# total num of Elasticsearch documents to get with API call
total_docs = 20
print ("nmaking API call to Elasticsearch for", total_docs, "documents.")
response = elastic_client.search(
index='employees',
body={},
size=total_docs
)
# grab list of docs from nested dictionary response
print ("putting documents in a list")
elastic_docs = response["hits"]["hits"]
"""
GET ALL OF THE ELASTICSEARCH
INDEX'S FIELDS FROM _SOURCE
"""
# create an empty Pandas DataFrame object for docs
docs = pandas.DataFrame()
# iterate each Elasticsearch doc in list
print ("ncreating objects from Elasticsearch data.")
for num, doc in enumerate(elastic_docs):
# get _source data dict from document
source_data = doc["_source"]
# get _id from document
_id = doc["_id"]
# create a Series object from doc dict object
doc_data = pandas.Series(source_data, name = _id)
# append the Series object to the DataFrame object
docs = docs.append(doc_data)
"""
EXPORT THE ELASTICSEARCH DOCUMENTS PUT INTO
PANDAS OBJECTS
"""
print ("nexporting Pandas objects to different file types.")
# export the Elasticsearch documents as a JSON file
docs.to_json("objectrocket.json")
# have Pandas return a JSON string of the documents
json_export = docs.to_json() # return JSON data
print ("nJSON data:", json_export)
# export Elasticsearch documents to a CSV file
docs.to_csv("objectrocket.csv", ",") # CSV delimited by commas
# export Elasticsearch documents to CSV
csv_export = docs.to_csv(sep=",") # CSV delimited by commas
print ("nCSV data:", csv_export)
# create IO HTML string
import io
html_str = io.StringIO()
# export as HTML
docs.to_html(
buf=html_str,
classes='table table-striped'
)
# print out the HTML table
print (html_str.getvalue())
# save the Elasticsearch documents as an HTML table
docs.to_html("objectrocket.html")
print ("nntime elapsed:", time.time()-start_time)
Code Snippets from Qbox, GitHub and ObjectRocket
[source_Destination_Banner]
Limitations of Manually Exporting Elasticsearch Data
- No Versioning: Manual exports may not track versions or incremental changes, making it hard to synchronize data over time.
- Time-Consuming: Manual exports can be slow, especially for large datasets, as each document must be retrieved and processed.
- Error-Prone: Human errors like incorrect queries, incomplete exports, or missed data can occur, affecting the reliability of the process.
- Resource Intensive: Manual exports can put additional load on the Elasticsearch cluster, slowing down performance for other users.
- Limited Scalability: Difficult to handle large datasets or frequent exports without automation, which limits scalability.
Read additional resource on Connecting Elasticsearch to BigQuery.
Conclusion
In this blog, you learned about Elasticsearch export using three different methods. You can select your desired method to export Elasticsearch data for robust use across multiple platforms. Running scripts via selected plug-ins can help carry out this task, while other methods will automate the process for you. Choose your desired channel to reap the benefits of Elasticsearch in tandem with your data warehouse. But, if you are looking for an automated solution to export data from Elasticsearch to a data warehouse, then try Hevo.
Hevo is a No-Code Data Pipeline. It supports pre-built integrations from 150+ data sources at a reasonable price. Sign up for Hevo’s 14-day free trial to migrate your Elasticsearch data to your data warehouse in minutes.
FAQ on How to download data from an Elasticsearch Export index
Can we export data from Elasticsearch?
Yes, you can export data from Elasticsearch using tools like elasticsearch-dump
or by querying and saving data using APIs.
What does Elasticsearch exporter do?
It helps in exporting data from Elasticsearch to various formats or destinations, such as files or databases.
How to take a dump of Elasticsearch?
Use the Snapshot API to take a dump of Elasticsearch data, saving it to a repository like Amazon S3 or HDFS.
Sign Up for a 14 day free trial.
Share your experience of exporting Elasticsearch data in the comment section below.
Aman Deep Sharma is a data enthusiast with a flair for writing. He holds a B.Tech degree in Information Technology, and his expertise lies in making data analysis approachable and valuable for everyone, from beginners to seasoned professionals. Aman finds joy in breaking down complex topics related to data engineering and integration to help data practitioners solve their day-to-day problems.