Druid Superset Integration: 3 Easy Steps

• June 11th, 2021

As of today, 4.72 billion people are connected to the internet. This number is not rigid as it continues to grow daily. For instance, in the past 12 months alone, the number of internet users grew by a staggering 332 million people. What does this translate to in the field of statistics? Lots and lots of data. Why? More and more companies are turning to the internet to provide services to their customers. As in the traditional sense, company owners need to observe client data to understand profits, trends, and general business behavior. Keep in mind that most companies generate a vast amount of data, usually several terabytes per day.

With this in mind, this post will give you a comprehensive introduction to Druid Superset Integration and give you the benefits you stand to gain by using these two tools concurrently. Read along!

Table of Contents

Introduction to Apache Druid

Apache Druid logo
Image Source

The nature of data generated by modern-day data-driven companies and other real-world problems has led to the rise of real-time Data Analysis. In simple terms, this is a form of Data Analysis where data is analyzed immediately it is collected with no downtime. In a real-world scenario, real-time analysis is used in areas such as crisis management, location management, analyzing customer trends, and more. This is where Apache Druid comes in.

Apache Druid is a distributed Data Store designed for large chunks of data. The tool is primarily applied in situations where real-time collection, super-fast queries, and high uptime are valued, such as supply chain analytics.   

Key Features of Apache Druid

Below are some of the critical features of the software:

  • Columnar Storage Format: This feature is integral in boosting Apache Druid’s speed in real-time information handling. The column-oriented storage format only loads the columns required for a particular query. Moreover, each of these columns is optimized to meet the specifications of a particular data type. 
  • Real-time Data Ingestion: Data is collected either in real-time or in batches based on user specifications. 
  • Cloud-Native: This feature makes Apache Druid a fault-tolerant system where data is secured. Once information is ingested, a copy is made in deep storage so that in an instance where one of the servers fails, recovery is easy. 
  • Superfast Searches and Filtering: The tool uses Roaring or CONCISE Compressed Bitmap Indexes for quick filtering and searches on numerous columns.

Introduction to Apache Superset

Apache Superset Logo
Image Source

Apache Superset is an open-source cloud-native application designed to handle large-scale Data Visualization and exploration purposes. With the software, users of all skillsets can interact with data using graphs, charts, and more. 

Key Features of Apache Superset

Below are some of the features of Apache Superset that make it ideal for Data Visualization:

  •  It has a user-friendly interface that makes it suitable for users of a basic skill set. 
  • You can create different dashboards and share them with the relevant parties. 
  • The software entails a rich set of Data Visualizations.
  • Integration capabilities with Druid.io.
  • The software comes with a state-of-the-art security layer that lays out well-defined rules on who gets access to the system. 
  • It uses SQLAlchemy to integrate with numerous most SQL-speaking RDBMS, including PostgreSQL and MySQL.
  • It boasts an easy-to-use interface that gives complete access to the user with customization options as to what is displayed in the UI. 

By now, you should have a rough idea of what both Apache Druid and Apache Superset are in terms of data storage and analytics. It is palpable that Apache Druid is efficient for real-time data collection, whereas Superset is designed for Data Visualization. By connecting the two, you can capture your business data as it comes in using Apache Druid and visualizes it in Apache Superset. Let’s see how you can do this. 

Prerequisites

There are several factors you need to check before connecting the two tools:

  • You should install both Apache Superset and Apache Druid on your system.
  • You need Linux, Mac Os, and any other Unix-like OS for Apache Druid. 
  • You also need Java 8 or higher for the system.
  • For Apache Superset, you need Python 3.8 or higher. Also, keep in mind that Windows is not supported for both of these tools. 
Simplify your Data Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate data from 150+ data sources (including 30+ Free Data Sources) to a destination of your choice like Apache Superset in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources provides users with the flexibility to bring in data of different kinds, in a smooth fashion without having to code a single line. 

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 150+ sources, that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.

You can try Hevo for free by signing up for a 14-day free trial.

Understanding the Steps for Druid Superset Integration

Here are the steps that you can follow to set up Druid Superset Integration:

Druid Superset Integration: Apache Druid Installation Process

Once you have checked all the requirements, you can go on ahead and install Apache Druid on your system. This tutorial will be using Ubuntu OS. Here are the steps involved in this process:

Step 1: Downloading Apache Druid

Head on to your terminal and type in the following:

java -version
sudo apt install openjdk-8-jre-headless

The command mentioned above basically downloads and installs the Java Runtime Environment on your system. This is an essential requirement for Apache Druid to run successfully. The next step is downloading and installing the latest Apache Druid version, which at the time of writing is 0.20.1. Download and unpack it using the following commands: 

wget https://apache.claz.org/druid/0.20.1/apache-druid-0.20.1-bin.tar.gz
tar -xzf apache-druid-0.20.1-bin.tar.gz

Step 2: Enabling Metadata Authentication for Apache Druid

You need to enable some basic metadata authentication to Apache Druid’s configuration. Navigate to Apache Druid’s directory and find the common.runtime.properties file. Next, open this file in your text editor as follows: 

cd apache-druid-0.20.1
nano /conf/druid/single-server/micro-quickstart/_common/common.runtime.properties

Once this is done, add the druid-basic-security to the list of extensions. 

druid.extensions.loadList=[...,"druid-basic-security"]

Next up, you need to allocate every user with a password that will allow them to perform SQL queries as follows:

#
# Authentication
#
druid.auth.authenticatorChain=["MyBasicMetadataAuthenticator"]
druid.auth.authenticator.MyBasicMetadataAuthenticator.type=basic
druid.auth.authenticator.MyBasicMetadataAuthenticator.initialAdminPassword=password1
druid.auth.authenticator.MyBasicMetadataAuthenticator.initialInternalClientPassword=password2
druid.auth.authenticator.MyBasicMetadataAuthenticator.credentialsValidator.type=metadata
druid.auth.authenticator.MyBasicMetadataAuthenticator.skipOnFailure=false
druid.auth.authenticator.MyBasicMetadataAuthenticator.authorizerName=MyBasicMetadataAuthorizer

You also need to add an escalator as follows:

# Escalator
druid.escalator.type=basic
druid.escalator.internalClientUsername=druid_system
druid.escalator.internalClientPassword=password2
druid.escalator.authorizerName=MyBasicMetadataAuthorizer
# Authorizer
druid.auth.authorizers=["MyBasicMetadataAuthorizer"]
druid.auth.authorizer.MyBasicMetadataAuthorizer.type=basic

For more information on this section, please refer to the Apache Druid documentation. Otherwise, you have successfully installed Apache Druid on your system. You can now exit the terminal and open the Apache Druid interface by navigating http://localhost:8888 on a browser of your choosing.

Enabling Security on Druid
Image Source

Druid Superset Integration: Apache Superset Installation Process

You can install Apache Superset by using Python on an Operating System like Ubuntu. The installation involves four key steps:

  • Installation of key Dependencies
  • Upgrading Python Pip and Setup Tools
  • Installation and Initialization of Apache Superset
  • Logging into Apache Superset

For more detailed information on the same, you can refer to this blog.

Druid Superset Integration: Add Database to Apache Superset

Due to the vast nature of data handled by companies, analysts need numerous dashboards for visualizing data in real-time and robust software for data collection. This is where Apache Druid and Apache Superset come in. Both of these are Data Analysis software that has intertwined histories. Initially, Apache Superset was designed as a visualization tool for Apache Druid, thus Druid Superset Integration has various robust capabilities. 

  • Step 1: Establishing Druid Superset Integration is pretty straightforward and will not take a significant chunk of your time compared to installing each of these tools. All you have to do is open Apache Superset, hover over the Data tab, and select Databases. Click the plus sign found in the top-right corner of the screen. 
Superset Dashboard
Image Source
  • Step 2: A new dialog box will appear prompting you for the database details. Name the database and key in the connection string as shown below: 
Druid Superset Integration: Connecting Database
Image Source
  • Step 3: Remember to perform the following tasks: 
    • Replace 10.0.0.1 with your Apache Druid address.
    • Replace the password with the one you set up initially during Apache Druid installation.
  • Step 4: Next up, test the Druid Superset connection to ensure there are no problems. If it’s a success, then you have successfully established Druid Superset Integration. You can now visualize data in real-time!

Conclusion

You have successfully installed both Apache Druid and Apache Superset and established Druid Superset Integration by following the tips laid out above. This will allow you to analyze your data in real-time, which will leave you with a better chance of noting any anomalies with your business. 

Extracting complex data from a diverse set of data sources to carry out an insightful analysis can be a challenging task and this is where Hevo saves the day! Hevo offers a faster way to move data from Databases or SaaS applications into your Data Warehouse to be visualized in a BI tool such as Apache Superset. Hevo is fully automated and hence does not require you to code. You can try Hevo for free by signing up for a 14-day free trial. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

No-code Data Pipeline For Apache Superset