Redash is a popular BI solution that provides users with a Collaborative Visualization and Dashboard platform. Thousands of companies, like Cloudflare and SoundCloud, have embraced this product as it allows them to seamlessly run SQL queries and generate dashboards to communicate with decision-makers. 

On the other hand, Databricks is a Cloud-based Data Engineering platform that is widely used by businesses to Process, Transform, and Explore large amounts of data. It can also be used to perform interactive analysis and create Machine Learning applications.

This article will guide you through the process of setting up Redash Databricks Integration using 4 simple steps. It will also provide you with a brief overview of Redash and Databricks with their key features and the benefits of setting up Redash Databricks Integration.

Prerequisites

You will have a much easier time setting up your Redash Databricks Integration if you have gone through the following prerequisites:

  • An active Redash account.
  • An active Databricks account.

Introduction to Redash

Redash Logo
  • Redash is one of the popular Collaborative Dashboarding and Visualization tools that allow users to interact with data regardless of their technical knowledge.
  • When compared to other Data Visualization platforms, Redash provides a plethora of robust integration functionalities. This feature makes it a favorite among organizations that have a variety of applications for managing their business processes.
  • You can also seamlessly integrate Redash with Data Warehouses, perform SQL queries to select subsets of data for visualizations, and share dashboards with ease.
  • Overall, Redash will help your organization adopt a data-driven mindset, which is critical in today’s cut-throat business world.
  • You’ll have all the information you need about your business at your fingertips with Redash Dashboards. Redash is quite popular among SQL users since it enables them to query their data.

Introduction to Databricks

Databricks Logo
  • Databricks is a popular Cloud-based Data Engineering platform for handling and manipulating large amounts of data. It allows you to easily extract insights from your existing data while also assisting you in the development of AI (Artificial Intelligence) solutions. 
  • It also offers Machine Learning libraries such as Tensorflow, Pytorch, and others for training and constructing Machine Learning Models. 
  • Databricks provides an interactive workspace with a Zero-Management cloud platform. It allows Data Analysts, Data Scientists, and Developers to efficiently extract values from large amounts of data. 
  • Furthermore, it easily integrates with third-party applications such as BI (Business Intelligence) and domain-specific tools to provide valuable insights. 
  • Large-scale businesses use this platform for a wide range of tasks, including ETL (Extract, Transform, and Load), Data Warehousing, and Dashboarding Insights for internal and external users.
  • Today, Databricks is widely used by various enterprise customers to run large-scale production operations across a wide range of industries, including Healthcare, Media and Entertainment, Finance, Retail, and much more.

Steps to Set Up Redash Databricks Integration

Now that you have a basic understanding of both the technologies let us go through the procedure of setting up Redash Databricks Integration. Below are the steps you can follow to set up Redash Databricks Integration:

Step 1: Log In to Databricks and Generate a Personal Access Token

The first step in setting up Redash Databricks Integration is to log in to your Databricks account. In case, you don’t have an account, you can sign up for a Databricks account as shown below.

Databricks Sign Up Page

Now, you can follow the below-mentioned procedures to generate a Personal Access Token.

  • Navigate to your Databricks workspace and click the Settings Icon in the lower-left corner as shown below.
Databricks Workspace
  • Click User Settings as shown below.
User Settings in Databricks
  • Navigate to the Access Tokens tab and click on Generate New Token as shown below.
Access Tokens Tab in Databricks
  • Enter a description (comment) and expiration date if required and then click on Generate as shown below.
Generating Token in Databricks
  • Copy the generated token and keep it safe as this will be required in the later steps.
Migrate data from MongoDB to Databricks
Migrate Data from MySQL to Databricks
Migrate Data from Google Analytics to Databricks

Step 2: Copy the Connection Details for SQL Endpoint

After you have successfully generated the Personal Access Token, you can follow the below-mentioned procedures to get the connection details for SQL Endpoint.

  • Navigate to your Databricks workspace and click the SQL Endpoints icon present in the sidebar.
  • Choose an endpoint where you want to connect.
  • Navigate to the Connection Details tab and copy the connection details as shown below.
Connection Details of SQL Endpoints

Step 3: Log In to Redash and Select Databricks as a New Data Source

After you have successfully completed step 2, you can log in to your Redash account. In case, you don’t have an account, you can sign up for a Redash account as shown below.

Redash Sign Up Page

Now, you can follow the below-mentioned procedure to select Databricks as a New Data Source in Redash. 

  • Click the Settings icon on the Redash Homepage to access the Data Sources management page as shown below.
Redash Homepage
  • Select “Databricks” as the data source from the available options as shown below.
Selecting Databricks as Data Source

Step 4: Configure the Connection to Set Up Redash Databricks Integration 

After you have successfully selected the Databricks as the data source, you will be prompted for the necessary configuration details to set up Redash Databricks Integration as shown below.

Configuring Connections
  • Fill in the necessary configuration details that were copied in steps 1 and 2 as shown below.
Filling Configuration Details

You may now run SQL queries on Delta Lake tables as if they were any other Relational data source, and immediately visualize the query results with Databricks as shown below.

SQL Queries on Delta Lake Tables

With this, you have successfully set up your Redash Databricks Integration. It’s as simple as that.

Benefits of Setting Up Redash Databricks Integration

Some of the benefits of setting up Redash Databricks Integration include:

  • Redash Databricks Integration enables Databricks users to shift to a unified Data Analytics platform that can handle any data use case and save significant costs along with increased operational efficiencies.
  • Redash Databricks Integration allows you to execute queries and present the results on shareable Dashboards and Visualizations. These results can be visualized in a variety of ways, including graphs, cohorts, and funnels.
  • Redash Databricks Integration enables fast query execution for Data Analytics and Data Science without moving data out of the Data Lake.

Frequently Asked Questions (FAQs)

Does Redash use SQL?
Yes, Redash supports all types of data sources including SQL, NoSQL, Big Data, and API. It even provides a powerful online SQL editor for querying data from different sources to answer complex business questions.

Can I use Databricks on-premise?
At the moment, Databricks runs on every major public Cloud. However, they are continuously exploring other deployment scenarios including the on-premise clusters.

Do Databricks interoperate with other Apache Spark distributions?
Yes, Databricks develops a web-based platform for Apache Spark. It runs 100% Spark and hence any application developed on Databricks can run on any distribution compatible with Apache Spark.

What is Hive in Databricks?
Apache Hive is a Data Warehouse project for the analysis of large datasets stored in Hadoop’s HDFS and compatible file systems such as Amazon S3, Azure Data Lake Storage, Google Cloud Storage, etc. It provides an SQL interface for querying data in various Hadoop-compatible databases.

What are Delta and Parquet?
Delta often referred to as Delta Lake, is an open-source storage layer that supports ACID transactions on Apache Spark. Apache Parquet, on the other hand, is an open-source column-oriented data storage format. Delta uses open-source Parquet files to store your data in your Cloud storage. Databricks is optimized for both Parquet and Delta.

Conclusion

  • In this article, you learned how to set up Redash Databricks Integration. It also gave an overview of Redash and Databricks with their key features.
  • You also learned about the benefits of setting up this integration. You can now create your Redash Databricks Integration to leverage the benefit of both these platforms in one place.
  • Integrating Redash with Databricks is a fantastic option if you have large volumes of data on Redash waiting to break out of silos and provide valuable insights. 
  • However, Databricks doesn’t provide native connections (like Redash) with other data sources or databases like MySQL, PostgreSQL, MongoDB, etc, and you’ll spend a lot of time migrating data from these sources to Databricks. This is where Hevo comes in.
  • Replicating the data from various sources into Databricks by writing custom scripts can prove to be a tedious process and will create a slug of errors and data consistency issues. 

Share your experience of setting up Redash Databricks Integration in the comments section below!

Ayush Poddar
Research Analyst, Hevo Data

Ayush is a Software Engineer with a strong focus on data analysis and technical writing. As a Research Analyst at Hevo Data, he authors articles on data integration and infrastructure using his proficiency in SQL, Python, and data visualization tools like Tableau and Power BI. Ayush's Bachelor's degree in Game and Interactive Media Design complements his technical expertise, enabling him to integrate cutting-edge technologies into his analytical workflows.

No-code Data Pipeline for Databricks