Databricks SQL Analytics 101: A Comprehensive Guide

Today’s applications and devices are generating a lot of big data. This data should be stored for analytics purposes. When analyzed, businesses can extract insights from the data which can be good for decision making. Databricks is a Cloud Platform that provides Data practitioners with a Data Warehousing solution. Users can create Data Lakes for storage of both structured and unstructured data.

Table of Contents

When using Databricks, you will need to extract insights from your data. Databricks SQL Analytics makes this easy for you. It provides Databricks users with a user interface that they can use to perform SQL and BI workloads on their data lakes. This gives them an opportunity to learn more about their data and take the necessary action.

In this article, we will be discussing Databricks SQL Analytics in detail.

What is Databricks SQL Analytics?

Databricks SQL Analytics is a Databricks feature that offers its users a simple interface to perform SQL and BI workloads on a Data Lake. This Lakehouse Architecture is 9x better in terms of pricing and performance compared to the traditional cloud data warehouses.

It provides a SQL-native workspace where users can run performance-optimized SQL queries.

Databricks SQL analytics also enables users to create advanced visualizations, dashboards, and alerts. It also comes with connectors to BI tools such as Tableau and Power BI to allow for greater collaboration and maximum performance.

Prerequisites

An active Databricks Account.
Knowledge of SQL.

Critical aspects of Databricks SQL Analytics

You can access Databricks SQL Analytics from your Databricks workspace. You simply have to switch to the SQL Analytics Tab located on the bottom left corner of the window.

To understand Databricks SQL Analytics completely, let’s study the following critical aspects:

Run the First Query
Creating a Query
Query Filters
Query Parameters
Query Snippets

A) Run the First Query

The following steps will help you to run your first SQL query from the onboarding panel:

Step 1: Click on the “Run your first query” option in the onboarding panel. The Query Editor will pop up loaded with the following query:

-- We've given you a sample query below to help get started.
-- Just hit "Run" to run the query and get your results.
SELECT
  concat(pickup_zip, '-', dropoff_zip) as route,
  AVG(fare_amount) as average_fare
FROM
  `samples`.`nyctaxi`.`trips`
GROUP BY
  1
ORDER BY
  2 DESC
LIMIT 1000

The query runs against the nyctaxi database of the SAMPLES catalog, which can be read in all workspaces.

Step 2: Click on the “Run” option. The results from the query will be shown after a few seconds.

B) Creating a Query

The following steps can help you to create a new query in Databricks SQL Analytics:

Step 1: Click on the “Create” icon from the sidebar and choose “Query”, or you can click on the “Queries” option from the sidebar and click on the “New Query” button.

Step 2: Select the Data Sources tab.

Databricks SQL Analytics - SQL Endpoint — Image Source

Step 3: Click on the dropdown icon and choose a SQL endpoint. If you need to filter the list, simply type some text in the text box.

When you have created your first query, all the available SQL endpoints will be displayed in alphabetical order. The icon shown next to the SQL endpoint represents its status.

C) Query Filters

A Query Filter in Databricks SQL Analytics helps you to reduce the amount of data that is shown in a visualization. It limits the data once it has been loaded onto your web browser.

Thus, filters are good for small datasets and in environments where query executions are costly, time-consuming, or rate-limited.

If you need to focus on a particular value, just alias your column as `<columnName>::filter`.

Consider the example given below:

SELECT action AS `action::filter`, COUNT(0) AS "count"
FROM events
GROUP BY action

To use a multi-select filter, alias your column to `<columnName>::multi-filter`. For example:

SELECT action AS `action::multi-filter`, COUNT (0) AS "count"
FROM events
GROUP BY action

Query filters can also be used on dashboards. The filter widget is normally shown behind every visualization where the filter has been added to the query.

D) Query Parameters

In Databricks SQL Analytics, query parameters enable you to substitute the values of a query during runtime. The query parameters are added between curly braces ({{ }}). A widget is shown above the results pane where the parameter value is set.

The following steps can help you to add a query parameter in Databricks SQL Analytics:

Step 1: Click on the “Add New Parameter” button ({{ }}). You can also type “Cmd + P”.

The parameter will be added to the text caret and the “Add Parameter” window will pop up.

Databricks SQL Analytics - Add parameter window — Image Source

You should enter the following values:

Keyword: The keyword to represent the parameter in your query.
Title: The title appears over the widget. By default, the title is similar to the keyword.
Type: It supports Text, Date, Date and Time, Number, Dropdown List, Date and Time (with seconds), and Query Based Dropdown List. It uses Text as the default type.

Step 2: Enter the keyword and then choose the parameter type.
Step 3: Click on the “Add Parameter” button.
Step 4: Set the value of the parameter in the parameter widget.
Step 5: Click on the “Apply Changes” option, and then click on the “Save” button.

If you need to re-run the query using a different parameter value, enter the new value in the widget and click on the “Apply Changes” option.

If you need to edit a particular query parameter, click on the gear icon located beside the parameter widget. To prevent the parameter from being changed by other users, click on the “Show Results Only” option.

Databricks SQL Analytics - Show Results Only Option — Image Source

E) Query Snippets

It is always easier to duplicate and modify previous work than to create something from scratch. This is especially the case with complex SQL queries likes JOINs and CASE expressions.

As the queries increase in number, it will become difficult for you to remember the queries with the statement that you need.

Query snippets are query segments that can be shared and triggered using auto-complete. Consider the snippet given below:

JOIN organizations g ON g.id = ${1:table}.g_id

The following steps can help you to create a query snippet in Databricks SQL Analytics:

Step 1: Click on the “Settings” gear icon located at the bottom sidebar and choose the “User Settings” option.
Step 2: Click on the “Query Snippets tab.
Step 3: Click on the “Create Query Snippet” option.
Step 4: Enter the snippet trigger in the “Trigger” field.
Step 5: Enter the snippet description, but this is optional.
Step 6: Enter the snippet in the “Snippet” field and click on the “Create” option.

For more information on the SQL features, you can visit the Databricks SQL page.

Conclusion

In this article, you have learned how to use Databricks SQL Analytics and perform the basic operations effectively. Databricks SQL Analytics allows you to execute lightning-fast queries on your lakehouse with data warehousing performance at data lake economics.

With the Built-in SQL Editor, visualizations, and dashboards, the Databricks SQL Analytics feature provides your SQL-savvy Data Analysts an alternative workspace to interact with an analytics-tuned cluster and share important business insights.

As you employ Databrick SQL Analytics for your firm and make data-driven decisions, you will observe a gradual growth in your business performance. As your business rapidly grows, astronomical volumes of data will be generated associated with the increasing number of customers and the sales of your products and services.

To efficiently handle all this data across various applications in your enterprise can be a time-consuming and resource-intensive job. You would require to devote a portion of your Engineering Bandwidth to Integrate, Clean, Transform and Load your data into a Data Warehouse or a destination of your choice for further Business analysis. This can be effortlessly automated by a Cloud-Based ETL Tool like Hevo Data.

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline that assists you in seamlessly transferring data from a vast collection of sources into a Data Lake like Databricks, Data Warehouse, or a Destination of your choice to be visualized in a BI Tool. It is a secure, reliable, and fully automated service that doesn’t require you to write any code!

If you are using Databricks as a Data Lakehouse and Analytics platform in your business and searching for a No-fuss alternative to Manual Data Integration, then Hevo can effectively automate this for you. Hevo with its strong integration with 100+ sources & BI tools (Including 40+ Free Sources), allows you to not only export & load Data but also transform & enrich your Data & make it analysis-ready.

Want to Take Hevo for a ride? Sign Up for a 14-day free trial and simplify your Data Integration process. Check out the pricing details to get a better understanding of which plan suits you the most.

Share with us your experience of using Databricks SQL Analytics. Let us know in the comments section below!

Nicholas Samuel Technical Content Writer, Hevo Data

Nicholas Samuel is a technical writing specialist with a passion for data, having more than 14+ years of experience in the field. With his skills in data analysis, data visualization, and business intelligence, he has delivered over 200 blogs. In his early years as a systems software developer at Airtel Kenya, he developed applications, using Java, Android platform, and web applications with PHP. He also performed Oracle database backups, recovery operations, and performance tuning. Nicholas was also involved in projects that demanded in-depth knowledge of Unix system administration, specifically with HP-UX servers. Through his writing, he intends to share the hands-on experience he gained to make the lives of data practitioners better.

No-code Data Pipeline for Databricks

Try for free