Data Analytics and Business Intelligence have become an integral part of strategy creation and decision-making for Business organizations. The data generated and stored due to daily operations require to be analysed using visualizations tools to obtain actionable insights.
Apache Superset is one such Business Intelligence and analysis tool that can be used by an analytics team to complete their tasks.
You will learn the process of installation of Apache Superset Dashboards and the steps that need to be followed for creating Superset Dashboards after establishing a connection with a database.
Introduction to Apache Superset
Apache Superset is a Python-based Business Intelligence platform. It is an open-source tool offered by Airbnb. It is a fast, lightweight, and intuitive tool that enables users to interact with different kinds of data sources and provide actionable insights of the data using visualizations such as Aggregation Charts, Tables, and Interactive Maps.
Key Features of Apache Superset
- Simple yet Effective User Interface: Simple no-code visual builder or the SQL IDE can be used to organise and visualize data according to the requirements.
- Latest Architecture: The platform is lightweight and highly scalable that utilizes the power of existing data infrastructure.
- Custom Visualisations and Dashboards: Various beautiful visualizations are available and the plugin architecture makes it easy to build visualizations and add them directly to the Superset platform.
Understanding Installation of Apache Superset Dashboards
The following process is required to be followed for a standard installation of Apache Superset using a Linux-based Command Line Interface.
Step 1: Installation of Dependencies
Run the following command to install OS-level dependencies:
sudo apt-get install build-essential libssl-dev libffi-dev python-dev
If you have python3.5 installed alongside python2.7, run the following commands also:
sudo apt-get install build-essential libssl-dev libffi-dev python3.5-dev python-pip libsasl2-dev libldap2-dev
Step 2: Installation of Setup tools
Run the following command to install the latest version of pip and setup tools libraries,
pip install --upgrade setuptools pip
Step 3: Installation on Apache Superset
Run the following commands to install and initialize Apache Superset:
# Install superset
pip install superset
# Create an admin user (you will be prompted to configure username, first and last name before setting up a password)
fabmanager create-admin --app superset
# Initialize the database
superset db upgrade
# Load some data to play with
superset load_examples
# Create default roles and permissions
superset init
# To start a development web server on port 8088, use -p to bind to another port
superset runserver -d
Step 4: Creation of Admin Account
After successful installation, from your browser go to http://localhost:8088
and create an admin account using the credentials.
For connecting with databases such as MySQL, you will be required to install the database connector using the following command,
install pip install mysqlclient
Step 5: Addition of Connection
To add a new connection, firstly login to Apache Superset, then click on “Sources” and subsequently select “Databases”.
Then you need to click the “+ Database” button in the top right corner.
Provide the credentials such as Connection Name and SQLAlchemy URI [Uniform Resource Identifier]. A URI should look similar to this:
mysql://root:XXXXXXXXXX@104.198.32.xxx:3306/rd_demo_db
Click on the “Test Connection” button to confirm if the connection can be established. On success, click on the “Add” button to save the connection.
Creating Apache Superset Dashboards
The process for the creation of Apache Superset Dashboards starts with the Data Sources [Databases and Tables] then they proceed with Data Slices and lastly proceed with Dashboards.
Each Dashboard is associated with multiple Slices and Each Slice can be associated with multiple Dashboards.
After a connection with the data source has been established, the following steps are required to be taken for creating a Dashboard:
Step 1: Registration of a New Table
You need to start with specific tables [termed as Datasets in Superset] that are supposed to be accessed in Superset Dashboards for querying. Proceed by selecting “Data” then “Datasets” and lastly the “+Dataset” button.
Choose the Database Schema and then the Table of your choice from the options being displayed.
Step 2: Customization of Column Properties
Configuration of the Column Properties is to control how the column should be accessed in the Explore Workflow.
Below you can observe how the access of Temporal, Filterable, and Dimension are controlled in the dataset.
Superset Semantic Layer provides functionality to store 2 types of data as follows:
- Virtual Metrics: SQL queries that use Aggregate values from numerous columns come in this category. Example:
SUM(recovered) / SUM(confirmed);
- Virtual Calculated Columns: SQL queries that customize the appearance and behaviour of a specific column come in this category. Example:
CAST(recovery_rate as FLOAT);
Step 3: Creation of Charts on Explore View
There are two main interfaces to explore data in Superset:
- Explore: It is a no-code visual builder in which the user can select a chart and customize the appearance.
- SQL Lab: It is an SQL IDE that can be used for cleaning, joining, and preparing data for Explore Workflow.
Using the Explore builder, you can check the Dataset view on the left side of the window, which displays the columns and metrics scoped to the currently selected dataset.
A Time-series Bar Chart can be easily created by clicking the options in the drop-down menu.
Step 4: Creation of Slice and Dashboard
You can either save your chart and add it to the existing Dashboard or to a new Dashboard of your choice.
The chart can be published by clicking “Save & Go To Dashboard”
The chart can be resized using the “Pencil” button, found in the top right corner.
You can also click and drag on the bottom right corner of the chart until it snaps into the position that you desire depending upon the underlying grid.
Click on the “Save” button to save your progress.
Understanding Various Apache Superset Dashboards Operations
You may need to enable the functionality to upload a CSV or Excel file to your database. The following section explains how to enable this functionality for a sample database.
Superset Dashboards Operations: Pivot Table
Pivot Table is a powerful tool to calculate, summarize, and analyze data allowing you to identify patterns and trends in your data and set up comprehensive comparisons. After selecting a Chart from the top right corner, choose the data source and click on the “visualization” type to see the “Visualisation” menu. Then proceed with selecting the Pivot Table visualization.
Enter the parameters on the window according to your requirement keeping into consideration the context of the data being handled.
After addition of the Query conditions in the window, the Pivot Table will be generated like the one shown below.
Line Chart
The line chart is a type of chart which helps display information as a series of data points connected by straight line segments. In Superset, you are required to select the Data Range from the Dataset of your choice. You can also implement certain queries of the Data Range and plot the results on the Line chart.
Next, select Run Query to show the data on the chart.
After you select the Data Range and enter the required Query, click on the “Run Query” button to show the data on the chart.
Markup
Users can incorporate Markups and annotations on the content depending on the requirements of the user. You can add text to your dashboard by navigating to the dashboard using the top menu and then going to edit mode by selecting “Edit Dashboard”.
Then in the “Insert Components” pane, you will be required to drag and drop a Markdown box on the dashboard.
You can edit the text in the textbox in the Markdown format, and also toggle between Edit and Preview mode using the menu located on the top of the window.
Time Comparison
Progress evaluation using Time comparison can be exceptionally crucial for any organization. You need to navigate to the Time Comparison option in the Advanced Analytics section of the Visualization menu. Subsequently, you can run a query to build/generate a new chart that will have additional series with the same values shifted only a week back in time.
You can then change the calculator type to Absolute difference and select Run Query, to see one series at a time showing the difference between the two series.
Learn More About:
Apache Superset Python Installation
Conclusion
In this article, you learned about the Apache Superset platform and its key features.
You also learned about the process of installing Apache Superset Dashboards using a command-line interface and the steps that need to be followed for creating Superset Dashboards after establishing a connection with a database.
Abhishek is a data analysis enthusiast with a strong passion for data, software architecture, and crafting technical content. He has strong data analytical skills and practical experience in leveraging tools like WordPress, RankMath, and SEMRush to optimize content strategies while contributing significantly to Hevo Data.