Do you want to analyze the Microsoft SQL Server data in Databricks? Is it true that you are finding it challenging to set up the SQL Server Databricks Integration? Assuming indeed, you’ve arrived at the correct spot! This article will answer all your questions and diminish the strain of discovering a really efficient arrangement. 

Upon a complete walkthrough of this article, you will gain a decent understanding of two of the most efficient methods that can be leveraged for Databricks Connect to SQL Server. Read along to learn more about the steps required for setting up Databricks Connect to SQL Server.

Methods for Building Databricks Connect to SQL Server

There are multiple ways to set up Databricks Connect to SQL Server, but we have hand-picked two of the easiest methods to do so:

Method 1: Connecting SQL Server to Databricks using Hevo Data

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you transfer data from Microsoft SQL Server, Azure SQL Database, and even your SQL Server Database on Google Cloud to Databricks & lets you visualize it in a BI tool.

Get Started with Hevo for free

Check Out What Makes Hevo Amazing:

  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.

Method 2: Using Custom Code to Connect Databricks to SQL Server

Follow the steps below to set up Databricks Connect to SQL Server by writing custom ETL Scripts. This will be an easy six-step process that begins with creating an SQL Server Database on Azure. In case you already have a SQL Server Database deployed either locally or on other Cloud Platforms such as Google Cloud, you can directly jump to Step 4 to connect your database.

Note: Here, we are using a Databricks set up deployed on Azure for tutorial purposes. However, you can apply the same procedure for connecting an SQL Server Database with Databricks deployed on other Cloud’s such as AWS and GCP. The process and drivers involved remain universal.

Step 1: Create a New SQL Database

  • Login to the Microsoft Azure portal using the appropriate credentials.
  • You can either click directly on the Create a Resource button or navigate to the left side vertical menu bar and select SQL Databases to get redirected to the Create SQL Database Screen.
Databricks Connect to SQL Server - Create a Resource
Image Source
  • Fill in the required fields like Database Name and choose your desired server. Once you are done doing that, click on the Review+Create button to create your SQL Database on Azure.
Databricks Connect to SQL Server - Creating SQL Database
Image Source

Step 2: Upload the Desired File to Databricks Cluster

  • Navigate to the left side menu bar on your Azure Databricks Portal and click on the Data Icon.
  • Now, click on the Add Data button.
Databricks Connect to SQL Server - Adding Data
Image Source
  • Browse the file that you wish to upload to the Azure Databrick Cluster and then click on the Select button to upload the file.
Uploading File for Databricks Connect to SQL Server
Image Source

Step 3: Load the Data

  • Click on the Workspace icon present on the left side of the home page.
  • Click on the Create option and then select Notebook to write your own custom code.
Databricks Connect to SQL Server 
- Workspace
Image Source
  • Now, provide a unique name to the Notebook and select Scala as the preferred language in which you wish to write the custom code.
  • By default, the Cluster’s name is pre-populated if you are working with a single cluster. In case you have created multiple clusters, you can select the desired cluster from the drop-down menu.
  • Finally, click on the Create button.
Databricks Connect to SQL Server - Creating new Notebook
Image Source

 Step 4: Create the JDBC URL and Properties 

  • Firstly, you need to create a JDBC URL that will contain information associated with either your Local SQL Server deployment or the SQL Database on Azure or any other Cloud platform. You further need to add other details such as Port Number, User, and Password. You can refer to the following piece of code to do so:
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"
You can replace the Hostname, Port and Database parameters as per your deployment. Once you’ve added your DB specific details, your URL string will look like this:

Example: val url = "jdbc:sqlserver://azsqlhevoserver.database.windows.net:1433;database=azsqlhevodb;user=Rakesh;password=*******"
  • Now it’s time to create the properties or functions to link the parameters.
import java.util.Properties
val myproperties = new Properties()
myproperties.put("user", "Rakesh")
myproperties.put("password", "******")

Step 5: Check the Connectivity to the SQL Server Database

The code given below will help you in checking the connectivity to the SQL Server database:

val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
myproperties.setProperty("Driver", driverClass)

Step 6: Read & Display the Data

  • To read the content of the file that you uploaded in the previous step, you can create a Data Frame. The columns in the first row of the file will be used as the Data Frame’s column names if header = true is set. You can specify Spark to automatically infer the schema of each column by setting inferSchema = true.
val mydf = spark.read.format("csv")
    .option("header","true")
    .option("inferSchema", "true")
          .load("File")
  • Lastly, to display the data, you can simply use the display function:
  display(mydf)

Once you follow all the above steps in the correct sequence, you will be able to build Databricks Connect to SQL Server.

Limitations of Writing Custom Code to Set up Databricks Connect to SQL Server

There are a few limitations of using Manual ETL Scripts to Connect Datascripts to SQL Server. Some of them are listed below:

  • Manually writing ETL Scripts requires significant technical bandwidth.
  • If you need the data to be transferred in real-time, writing custom scripts to accomplish this can be tricky, as it can lead to a compromise in Data Accuracy and Consistency.
  • This approach is suitable for a one-time bulk insert. In most cases, you will need to execute a continuous load process to ensure that the destination always receives the latest data. In that case, you will need logic to handle the duplicate data in real-time.
  • You will be required to maintain and add to the custom scripts when you scale this solution. This can consume a lot of time and could result in errors and inconsistencies in the data.
  • During the implementation of this custom solution, you may not have technical assistance. If a problem is encountered, you may have limited help and resources.

Conclusion

This blog introduced you to two methods that can be used to set up Databricks Connect to SQL Server. It gives comprehensive information on how to connect Databricks to SQL Server. It also briefed you about SQL Server and Databricks along with their features. If you want to integrate data from various data sources such as SQL Server into your desired Database/destination like Databricks and seamlessly visualize it in a BI tool of your choice, Hevo Data is the right choice for you! It will help simplify the ETL and management process of data sources and destinations.

Visit our Website to Explore Hevo

Hevo Data provides users with a simpler platform for integrating data from 150+ Data Sources like SQL Server to Databricks for Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouse, Database, or a destination of your choice. It also provides a consistent and reliable solution to manage data in real-time, ensuring that you always have Analysis-ready data in your desired destination.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

Share your preferred approach for setting up Databricks Connect to SQL Server. Let us know in the comments below!

mm
Former Research Analyst, Hevo Data

Rakesh is a Cloud Engineer with a passion for data, software architecture, and writing technical content. He has experience writing articles on various topics related to data integration and infrastructure.

No-Code Data Pipeline for Databricks