Setting up Superset GitHub Integration: 3 Easy Methods

on Apache Superset, BI Tool, Data Integration • June 21st, 2021 • Write for Hevo

Current business reality has taken business intelligence to a higher level, making it a core part of business operations. In that regard, data visualization and analytics have gained recognition in recent times as business owners and executive management teams strive to make informed and insightful real-time business decisions. 

Conventional business analytical tools such as QlikView, Microsoft Excel Pivot table, and Power Bi, among others, are commonly used by data analysts. Meanwhile, Superset is a powerful business intelligence and data analytics tool that businesses can incorporate as it has advanced features that enhance data visualization. Currently, several reputable business entities have embraced the use of Superset. These businesses include Twitter, Yahoo, Airbnb, Udemy, among others. 

Apache Superset is a newly introduced Data Analytics and Business Intelligence application with several amazing features and a less complex user interface.  As such, you do not need to be a programming guru before you can conveniently use the application. The application also lets the user properly organize, collate, explore, and clean up the data. Interestingly, Superset Github is an open-source application, in that way, the user can conveniently modify it in line with his preference. 

Table of Contents

Advantages of using Superset GitHub Integration

Superset has some advantages that position it as one of the best data analytical tools. These include:

  • Varieties of appealing visualization for quality data presentation and analysis.
  • Ability to explore and organize large datasets which allows users to build interactive and comprehensive dashboards. 
  • Superset allows users to drill through the dataset effortlessly for deeper insights.
  • The dashboards respond swiftly to navigation keys, enhancing the time efficiency of users. 
  • Requires no programming expertise while also protecting data integrity.

Unique Features of Superset

Apache Superset Logo - Superset GitHub

The application has several features that make it user-friendly. These features include:

SQL Editor

SQL Editor - Superset GitHub

Superset GitHub comes with an SQL editor that encourages interactive query editing. With this feature, users can run and manage queries on their datasets. Desired data range or column can also be selected through SQL queries.

Control and Permissions over Charts and Dashboards

Charts and Dashboards Illustration - Superset GitHub

Superset ensures the datasets are adequately protected. This is done by defining the extent of control or permission granted to other users of the file. In that way, users can effectively manage the data to avoid compromise. 

If you want to learn more about building Superset Dashboards, you can find the guide here.

Database Support

Database and Datasources - Superset GitHub

Another amazing feature of superset GitHub is that it supports several SQL databases which permit access to Oracle, MySQL, MS SQL Server, Sybase, Postgres, among others.  

Official documentation about Apache Superset can be found here.

3 Ways to Set up Superset GitHub Integration

You can use any of the following methods to implement your Superset GitHub Integration according to your need:

Method 1: Superset GitHub Integration using Local Configuration with Docker 

Using this method, the user will be manually required to establish a connection between Superset and GitHub using the local configuration setup process using Looker.

Method 2: Superset GitHub Configuration Through the Installation of Python Packages

Under this method, the user will be required to implement Superset Github integration by manual installation of Python packages on the deployement to provide access to data.

Method 3: Superset GitHub Integration using Hevo

Hevo is a No-code Data Pipeline. It will automatically load your GitHub data into Superset without writing any line of code. 

Let’s discuss each of them in detail.

Procedure of Setting Up Superset GitHub Integration

Method 1: Superset GitHub Integration using Local Configuration with Docker 

The use of docker is the optimal and most recommended method for installing superset GitHub locally. Because superset does not officially support Windows, below is a virtual machine (VM) workaround. There are few steps to follow in configuring superset with docker. These steps are:

Step 1: Installation of a Docker-Compose and Engine on the Device 

Firstly, we have to do an installation of docker on the device. Docker for Mac comes with a docker engine and the most updated version of a docker-compose.  After installing the docker for Mac, we have to adjust the allocated memory size to 6 GB. The reason is that superset may have troubles starting with its default memory size of 2 GB RAM. This can be done by going to the  “resources” section under the preferences pane.  

To install Docker for Linux, you have to follow the instructions that follow the installation of the docker-compose.  This allows the user to configure the docker following their Linux preference. 

Windows do not support the installation of Superset GitHub. Hence, windows users would need to do some manipulations by installing Ubuntu desktop virtual machine (VM). After that, the user can now install docker on Linux in the virtual machine.  For windows, it is best to allocate a minimum of 8 GB RAM and a 40 GB hard drive to have sufficient space for running the application. 

Step 2: Cloning the Superset Repository

To clone the superset GitHub repository, we can use the command below:

$ git clone https://Github.com/apache/incubator-superset.git

A new folder (incubator-superset) will appear in the current directory when the completion of the command is successful. 

Step 3: Launching the Superset

The next thing is to insert the cd inside the newly created folder. The new command should look like this: 

$ cd incubator-superset

When you open the directory, run the command below:

$ docker-compose up

After completing the command, a logging output wall will appear on your device. This output is expected to slow down gradually, paving the way for a superset running instance on your device. 

Step 4: Providing the Login Details for the Superset

The superset local instance has a Postgres server that helps in storing datasets. Once the superset local instance has been launched, you can proceed to http://localhost:8088 to access the superset. Provide your login details, that is your default username and password. 

username: admin
password: admin

Method 2: Superset GitHub Configuration Through the Installation of Python Packages 

Step 1: Installing the Operating System Dependencies

Superset keeps information regarding all database connections in the metadata database. As such, the cryptography python library is helpful in encrypting the connection passwords. However, this python library has operating system dependencies., implying that users will have to install other OS dependencies to support the running of Superset. 

For the installation of Ubuntu and Debian, users will have to use the command below to ensure the necessary OS dependencies: 

sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip libsasl2-dev libldap2-devThe command to use for Ubuntu 20.04 is : 
sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev

When installing RHEL – derivatives, and Fedora, the command to use is: 

sudo yum upgrade python-setuptools
sudo yum install gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel

For Mac OSX, it is recommended that users should use the most recent version of OSX to enable quick resolution of issues. After getting the latest OSX version, the user will have to use the latest Xcode version. The command for the latest Xcode version is: 

xcode-select --install

It is not recommended for users to use the system python. However, homebrew python can use the pip below: 

brew install pkg-config libffi openssl python 
env LDFLAGS="-L$(brew --prefix openssl)/lib" CFLAGS="-I$(brew --prefix openssl)/include" pip install cryptography==2.4.2 

Step 2: Installing the Python virtualenv

 Users are encouraged to install the superset Github inside the python virtualenv. Python 3 comes with the virtualenv. In a case where the virtualenv is not installed, you can do the installation by using the OS package or the pip command below: 

pip install virtualenv

Users can also create or activate the virtualenv with the command below: 

# virtualenv is shipped in Python 3.6+ as venv instead of pyvenv.
# See https://docs.python.org/3.6/library/venv.html
python3 -m venv venv
. venv/bin/activate

Having completed the installation of virtualenv, every program you run will be done inside it. You can, however, choose to stop using the virtualenv by typing “deactivate“.

Step 3: Installing and Launching the Superset GitHub

After completing the installation of the operating system dependencies and python virtualenv, the next thing is to install the Superset and initiate it. To install the Superset, follow the guide below: 

# Install superset
pip install apache-superset
 
# Initialize the database
superset db upgrade
 
# Create an admin user (you will be prompted to set a username, first and last name before setting a password)
$ export FLASK_APP=superset
superset fab create-admin
 
# Load some data to play with
superset load_examples
 
# Create default roles and permissions
superset init
 
# To start a development web server on port 8088, use -p to bind to another port
superset run -p 8088 --with-threads --reload --debugger

After completing the Superset installation, the user should change the hostname port of the browser to http://localhost:8088 . Proceed by using the credentials provided when generating an admin account. Also, refresh the metadata by going to the “Admin” pane under “Menu”.  By refreshing the metadata, all data sources required by Superset will be available for display in the “Datasource” tab under the “Menu” pane.  Once all these steps have been completed, users can get started with data analytics and visualization. 

Method 3: Superset GitHub Integration using Hevo

Hevo Cover Image - Superset GitHub

Hevo is a No-code Data Pipeline. It supports pre-built data integrations from 100+ data sources, including GitHub. Hevo offers a fully managed solution for your data migration process to Superset. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data at Superset.

Hevo focuses on three simple steps to get you started:

  • Connect: Connect Hevo with GitHub and all your data sources by simply logging in with your credentials.
  • Integrate: Consolidate your data from several sources in Hevo’s Managed Data Integration Platform and automatically transform it into an analysis-ready form.
  • Visualize: Connect Hevo with your desired Reporting tool such as Superset and visualize your unified data easily to gain better insights.

Let’s look at some salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and maps it to the destination schema.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support call.

Explore more about Hevo by signing up for a 14-day free trial today

Conclusion

In this article, you learned about various unique features of Superset, the advantages of setting up Superset GitHub Integration, and three easy methods to implement this. By implementing the information at your disposal, you can set up Superset Github Integration in three different ways. If you are interested in learning about Redshift vs Teradata comparison, you can find the guide here, if you want to learn about Mixpanel vs Google Analytics comparison, you can find the guide here.

Integrating and analyzing data from a huge set of diverse sources can be challenging, this is where Hevo comes into the picture. Hevo Data, a No-code Data Pipeline helps you transfer data from a source of your choice in a fully automated and secure manner without having to write the code repeatedly. Hevo with its strong integration with 100+ sources & BI tools, allows you to not only export & load Data but also transform & enrich your Data & make it analysis-ready in a jiffy.

Get started with Hevo today! Sign up here for a 14-day free trial!

Visualize your Data in Apache Superset in Real-time Easily