How to Connect AWS Elasticsearch to Snowflake?: 2 Easy Methods

• July 15th, 2022

AWS Elasticsearch to Snowflake_FI

Built on Apache Lucene, AWS Elasticsearch is a distributed search and analytics engine. Elasticsearch, which was introduced in 2010 and has since grown to be the most widely used search engine, is frequently used for use cases involving log analytics, full-text search, security intelligence, business analytics, and operational intelligence.

Superior Data Analytics are made possible by the Cloud-based, fully managed Snowflake data warehouse solution. AWS, Azure, and GCP are used by Snowflake to handle its cloud infrastructure and data. The data is subjected to SQL queries by Snowflake in order to convert and offer insights.

In this article, you will learn how to transfer data from AWS Elasticsearch to Snowflake using CSV files format.

Table of Contents

What is AWS Elasticsearch?

AWS Elasticsearch to Snowflake: Elasticsearch logo
Image Source

AWS Elasticsearch is a distributed, open-source search and analytics engine with a strong REST API that can be used on a variety of platforms due to its speed and scalability. It permits the analytics of textual, numerical, and even geospatial data, which can be used for any intended purpose. Additionally, tasks such as data ingestion, storage, analysis, enrichment, and visualization can be carried out smoothly with it.

Key Features of AWS Elasticsearch

  • Fast APIs: Elasticsearch makes it simple to get started and quickly construct applications for a range of use-cases by providing simple REST based APIs, a simple HTTP interface, and uses schema-free JSON documents.
  • High Performance: Elasticsearch’s distributed architecture enables it to handle enormous amounts of data concurrently, swiftly locating the best matches for your searches.
  • Multiple Language Support: Numerous languages, including Java, Python, PHP, JavaScript, Node.js, Ruby, and many more, are supported by Elasticsearch.

What is Snowflake?

AWS Elasticsearch to Snowflake: Snowflake logo
Image Source

Snowflake is a fully managed SaaS (Software as a Service) platform. Data warehousing, data lakes, data engineering, data science, data application development, and secure sharing and consumption of real-time / shared data are all merged into one platform. To meet the demanding requirements of expanding enterprises, Snowflake provides pre-built capabilities including Storage and Compute Separation, On-the-Fly Scalable Compute, Data Sharing, Data Cloning, and Third-Party Tool Support.

The basis of Snowflake is not an existing database or “Big Data” software platform like Hadoop. In contrast, Snowflake blends a cutting-edge Cloud Architecture with a brand-new SQL query engine. Snowflake offers users a lot more in addition to all the features and operations that a corporate analytic database provides.

Key Features of Snowflake

The following are some of the features of Snowflake as a Software as a Service (SaaS) solution:

  • Snowflake enables you to create your own Data Communicate, enabling you to securely exchange real-time, controlled data.
  • By switching from nightly batch loads to real-time data streams in Snowflake, you may be able to improve the precision and effectiveness of your analysis.
  • Using the Caching Paradigm, Snowflake swiftly distributes cache results.
  • You may learn more about the habits and usage patterns of your customers with the aid of Snowflake.

Reliably integrate data with Hevo’s Fully Automated No Code Data Pipeline

If yours is anything like the 1000+ data-driven companies that use Hevo more than 70% of the business apps you use are SaaS applications. Integrating the data from these sources in a timely way is crucial to fuel analytics and the decisions that are taken from it. But given how fast API endpoints etc can change, creating and managing these pipelines can be a soul-sucking exercise.

Hevo’s no-code data pipeline platform lets you connect over 150+ sources in a matter of minutes to deliver data in near real-time to your warehouse. What’s more, the in-built transformation capabilities and the intuitive UI means even non-engineers can set up pipelines and achieve analytics-ready data in minutes. 

All of this combined with transparent pricing and 24×7 support makes us the most loved data pipeline software in terms of user reviews.

Take our 14-day free trial to experience a better way to manage data pipelines.

Get started for Free with Hevo!

Methods to Connect AWS Elasticsearch to Snowflake

Method 1: Connect AWS Elasticsearch to Snowflake using Hevo 

AWS Elasticsearch to Snowflake: Hevo Banner

Hevo helps you directly transfer data from 150+ sources such as AWS Elasticsearch to Snowflake, Database, Data Warehouses, or a destination of your choice in a completely hassle-free & automated manner. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Sign up here for a 14-Day Free Trial!

The following steps can be implemented to connect AWS Elasticsearch to Snowflake using Hevo:

Step 1: Configure AWS Elasticsearch as your Source

Perform the following steps to configure Elasticsearch as the Source in Hevo:

  • Step 1.1: Click PIPELINES in the Asset Palette.
  • Step 1.2: Click + CREATE in the Pipelines List View.
  • Step 1.3: In the Select Source Type page, select Elasticsearch.
  • Step 1.4: In the Configure your Elasticsearch Source page, specify the following:
    Elasticsearch settings
  1. Pipeline Name: A unique name for your Pipeline, not exceeding 255 characters.
  2. Database Host: The Elasticsearch database host’s IP address or DNS. Provide the public IP address or DNS of the EC2 instance as retrieved in Step 3 if your Elasticsearch database is hosted inside a VPC.
    Note: For URL-based hostnames, exclude the protocol part (http:// or https://).
  3. Database Port: The port on which your Elasticsearch server listens for connections. Default value: 9200.
    Note: For an Elasticsearch database hosted inside a VPC, this port number is 443.
  4. Database User: The authenticated user that can read the tables in your database.
  5. Database Password: The password for the database user.
  6. Connection Options: Select one of the following options to specify how Hevo must access your database instance:
    1. Connect through SSH: Enable this option to connect to Hevo using an SSH tunnel, instead of directly connecting your Elasticsearch database host to Hevo. This provides an additional level of security to your database by not exposing your Elasticsearch setup to the public. Read Connecting Through SSH.
      If this option is disabled, you must whitelist Hevo’s IP addresses to allow Hevo to connect to your Elasticsearch host.
      Note: This option does not apply to an AWS Elasticsearch Source. To connect to that Source, you must set up a reverse proxy server.
    2. Connect through HTTPS: Enable this option if your cluster is configured to use HTTPS. Contact your administrator if you do not have this information. Keep this option disabled to connect using HTTP.
  7. Advanced Settings:
    1. Load Historical Data: If this option is enabled, the entire table data is fetched during the first run of the Pipeline. If disabled, Hevo loads only the data that was written in your database after the time of creation of the Pipeline.
    2. Include New Tables in the Pipeline: Applicable for all Pipeline modes except Custom SQL.
      If enabled, Hevo automatically ingests data from tables created in the Source after the Pipeline has been built. These may include completely new tables or previously deleted tables that have been re-created in the Source.
      If disabled, new and re-created tables are not ingested automatically. They are added in SKIPPED state in the objects list, in the Pipeline Overview page. You can update their status to INCLUDED to ingest data.
      You can change this setting later.
  • Step 1.5: Click TEST & CONTINUE.
  • Step 1.6: Proceed to configuring the data ingestion and setting up the Destination.

Step 2: Configure Snowflake as your Destination

To set up Snowflake as a destination in Hevo, follow these steps:

  • Step 2.1: In the Asset Palette, select DESTINATIONS.
  • Step 2.2: In the Destinations List View, click + CREATE.
  • Step 2.3: Select Snowflake from the Add Destination page.
  • Step 2.4: Set the following parameters on the Configure your Snowflake Destination page:
    • Destination Name: A unique name for your Destination.
    • Snowflake Account URL: This is the account URL that you retrieved.
    • Database User: The Hevo user that you created in the database. In the Snowflake database, this user has a non-administrative role.
    • Database Password: The password of the user.
    • Database Name: The name of the Destination database where data will be loaded.
    • Database Schema: The name of the Destination database schema. Default value: public.
    • Warehouse: SQL queries and DML operations are performed in the Snowflake warehouse associated with your database.

facebook page insights to snowflake: configure snowflake as destination

  • Step 2.5: Click Test Connection to test connectivity with the Snowflake warehouse.
  • Step 2.6: Once the test is successful, click SAVE DESTINATION.

Deliver smarter, faster insights with your unified data

Using manual scripts and custom code to move data into the warehouse is cumbersome. Changing API endpoints and limits, ad-hoc data preparation and inconsistent schema makes maintaining such a system a nightmare. Hevo’s reliable no-code data pipeline platform enables you to set up zero-maintenance data pipelines that just work.

  • Wide Range of Connectors:  Instantly connect and read data from 150+ sources including SaaS apps and databases, and precisely control pipeline schedules down to the minute.
  • In-built Transformations: Format your data on the fly with Hevo’s preload transformations using either the drag-and-drop interface, or our nifty python interface. Generate analysis-ready data in your warehouse using Hevo’s Postload Transformation.
  • Near Real-Time Replication: Get access to near real-time replication for all database sources with log based replication. For SaaS applications, near real time replication is subject to API limits.   
  • Auto-Schema Management: Correcting improper schema after the data is loaded into your warehouse is challenging. Hevo automatically maps source schema with destination warehouse so that you don’t face the pain of schema errors.
  • Transparent Pricing: Say goodbye to complex and hidden pricing models. Hevo’s Transparent Pricing brings complete visibility to your ELT spend. Choose a plan based on your business needs. Stay in control with spend alerts and configurable credit limits for unforeseen spikes in data flow.
  • 24×7 Customer Support:  With Hevo you get more than just a platform, you get a partner for your pipelines. Discover peace with round the clock “Live Chat” within the platform. What’s more, you get 24×7 support even during the 14-dayfree trial.
  • Security: Discover peace with end-to-end encryption and compliance with all major security certifications including HIPAA, GDPR, SOC-2.
Get started for Free with Hevo!

Get Started for Free with Hevo’s 14-day Free Trial.

Method 2: Connect AWS Elasticsearch to Snowflake Manually using CSV Files

You cannot directly export the data from AWS Elasticsearch to Snowflake. To export data from AWS Elasticsearch to Snowflake, first you will have to export data from AWS Elasticsearch as CSV files and then load the CSV files into Snowflake.

Step 1: Export Data from AWS Elasticsearch as CSV using Python Pandas

The first step in exporting data from AWS Elasticsearch to Snowflake is exporting data from AWS Elasticsearch as CSV files.

  • Documents can be exported in a variety of formats using Python Pandas. Elasticsearch Pandas may export files in the formats of HTML, CSV, or JSON. Use the following command to install PIP for Python 3.
sudo apt install python3-pip
sudo yum install python36
sudo yum install python36-devel
sudo yum install python36-setuptools
sudo easy_install-3.6 pip
  • The entire Python Pandas script for exporting and starting a data export in any of these formats is as follows:
#!/usr/bin/env python3
#-*- coding: utf-8 -*-
 
import sys, time, io
start_time = time.time()
 
if sys.version[0] != "3":
    print ("nThis script requires Python 3")
    print ("Please run the script using the 'python3' command.n")
    quit()
try:
    # import the Elasticsearch low-level client library
    from elasticsearch import Elasticsearch
    # import Pandas, JSON, and the NumPy library
    import pandas, json
    import numpy as np
except ImportError as error:
    print ("nImportError:", error)
    print ("Please use 'pip3' to install the necessary packages.")
    quit()
# create a client instance of the library
print ("ncreating client instance of Elasticsearch")
elastic_client = Elasticsearch()
 
"""
MAKE API CALL TO CLUSTER AND CONVERT
THE RESPONSE OBJECT TO A LIST OF
ELASTICSEARCH DOCUMENTS
"""
# total num of Elasticsearch documents to get with API call
total_docs = 20
print ("nmaking API call to Elasticsearch for", total_docs, "documents.")
response = elastic_client.search(
    index='employees',
    body={},
    size=total_docs
)
# grab list of docs from nested dictionary response
print ("putting documents in a list")
elastic_docs = response["hits"]["hits"]
 
"""
GET ALL OF THE ELASTICSEARCH
INDEX'S FIELDS FROM _SOURCE
"""
#  create an empty Pandas DataFrame object for docs
docs = pandas.DataFrame()
# iterate each Elasticsearch doc in list
print ("ncreating objects from Elasticsearch data.")
for num, doc in enumerate(elastic_docs):
    # get _source data dict from document
    source_data = doc["_source"]
    # get _id from document
    _id = doc["_id"]
    # create a Series object from doc dict object
    doc_data = pandas.Series(source_data, name = _id)
    # append the Series object to the DataFrame object
    docs = docs.append(doc_data)
 
"""
EXPORT THE ELASTICSEARCH DOCUMENTS PUT INTO
PANDAS OBJECTS
"""
print ("nexporting Pandas objects to different file types.")
 
# export the Elasticsearch documents as a JSON file
docs.to_json("objectrocket.json")
# have Pandas return a JSON string of the documents
json_export = docs.to_json() # return JSON data
print ("nJSON data:", json_export)
# export Elasticsearch documents to a CSV file
docs.to_csv("objectrocket.csv", ",") # CSV delimited by commas
# export Elasticsearch documents to CSV
csv_export = docs.to_csv(sep=",") # CSV delimited by commas
print ("nCSV data:", csv_export)
# create IO HTML string
import io
html_str = io.StringIO()
# export as HTML
docs.to_html(
    buf=html_str,
    classes='table table-striped'
)
 
# print out the HTML table
print (html_str.getvalue())
# save the Elasticsearch documents as an HTML table
docs.to_html("objectrocket.html")
 
print ("nntime elapsed:", time.time()-start_time)

Code snippets taken from ObjectRocket.

You can check out other methods for exporting data from AWS Elasticsearch here. Now you have your CSV Data with you. The first step in exporting data from AWS Elasticsearch to Snowflake is complete now.

Step 2: Load CSV Data into Snowflake 

The second step in exporting data from AWS Elasticsearch to Snowflake is importing CSV data into Snowflake. 

This section explains how to bulk load data into Snowflake using the SnowSQL client. Data can be loaded in bulk using SQL from any delimited plain-text file, such as Comma-delimited CSV files. Bulk loading is also possible for semi-structured data from JSON, AVRO, Parquet, or ORC files. But loading from CSV files is the article’s main focus. On what are referred to as stages on the inside, you can stage files in Snowflake. Each patron and each table has a stage. Snowflake further enables the development of named stages, such as demo stages.

  • Step A: Upload your data files so that Snowflake can access them to get started. What this amounts to is staging your files.
    • Without utilizing any external resources, internal stages enable convenient and safe data file storage. However, you can avoid staging and load directly from these external destinations provided your data files are already staged in a compatible Cloud storage like GCS or S3.
    • Additionally, you can simply upload CSV files from your personal computer.
  • Step B: From these prepared files, you next load your data into your tables. The “use” statement will be used to choose the previously built database.

Syntax

Use database [database-name];  

Example

use database dezyre_test;

Output:

AWS Elasticsearch to Snowflake: O/P 1
Image Source
  • Step C: Setting up a named file format for a collection of staged data that can be read or loaded into Snowflake tables.

Syntax: 

CREATE [ OR REPLACE ] FILE FORMAT [ IF NOT EXISTS ] 
                      TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ]
                      [ COMMENT = '' ]

Example: 

create or replace file format my_csv_format
  type = csv
  field_delimiter = '|'
  skip_header = 1
  null_if = ('NULL', 'null')
  empty_field_as_null = true
  compression = gzip;

Output:

AWS Elasticsearch to Snowflake: O/P 2
Image Source
  • Step D: At this point, a table is created using the Create statement, as seen below. It either creates a new table or changes an existing one in the current or specified schema.

Syntax: 

CREATE [ OR REPLACE ] TABLE  [ (  [  ] ,  [  ] , ... ) ] ;

Example: 

CREATE OR REPLACE TABLE dezyre_employees (
EMPLOYEE_ID number,
FIRST_NAME varchar(25),
LAST_NAME varchar(25),
EMAIL varchar(25),
PHONE_NUMBER varchar(15),
HIRE_DATE DATE,
JOB_ID varchar(15),
SALARY  number(12,2),
COMMISSION_PCT  real,
MANAGER_ID number,
DEPARTMENT_ID number
);

Output:

AWS Elasticsearch to Snowflake: O/P 3
Image Source
  • Step E: Using your local computer, upload the CSV data file to the Snowflake’s staging area as shown below. Additionally, if the location is secured, you can include the access credentials along with the URLs for the locations of the staged CSV files. You can also create named stages that point to different locations.

Syntax: 

put file://D:\dezyre_emp.csv @DEMO_DB.PUBLIC.%dezyre_employees;

Output:

AWS Elasticsearch to Snowflake: O/P 4
Image Source
  • Step F: At this point, as seen below, the CSV data is loaded into the previously created target Snowflake table.

Example: 

copy into dezyre_employees
  from @%dezyre_employees
  file_format = (format_name = 'my_csv_format' , error_on_column_count_mismatch=false)
  pattern = '.*dezyre_emp.csv.gz'
  on_error = 'skip_file';

Output:

AWS Elasticsearch to Snowflake: O/P 5
Image Source
  • Step G: You can verify that the information loaded into the target database is accurate by performing the select query shown below.

Example: 

select * from dezyre_employees;

Output:

AWS Elasticsearch to Snowflake: O/P 6
Image Source

You have successfully done AWS Elasticsearch to Snowflake data transfer.

Limitations of Connecting AWS Elasticsearch to Snowflake Manually

  • Data can only be moved from AWS Elasticsearch to Snowflake in one direction using the manual method. In order to maintain both tools up to date, two-way sync is required.
  • The manual process takes time because the records need to be updated often. This is a waste of time and resources that could be used for more crucial company duties.
  • Some customers may find the amount of engineering bandwidth needed to maintain workflows across numerous platforms and update current data bothersome.
  • No transformation is possible during data transport. This could be a big problem for companies that want to change their data before moving it from AWS Elasticsearch to Snowflake.

Conclusion  

In this article, you got a glimpse of how to connect AWS Elasticsearch to Snowflake after a brief introduction to the salient features, and use cases. The methods talked about in this article are using automated solutions such as Hevo and CSV files. The second process can be a bit difficult for beginners. Moreover, you will have to update the data each and every time it is updated and this is where Hevo saves the day!

Visit our Website to Explore Hevo

Hevo provides its users with a simpler platform for integrating data from 150+ sources for Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources like AWS Elasticsearch. You can use it to transfer data from multiple data sources into your Data Warehouses, Database, or a destination of your choice such as Snowflake. It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have Analysis-ready data in your desired destination.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share your experience of learning about AWS Elasticsearch to Snowflake! Let us know in the comments section below!

No-code Data Pipeline For Snowflake