Do you want to analyze the Microsoft SQL Server data in Databricks? Is it true that you are finding it challenging to set up the SQL Server Databricks Integration? Assuming indeed, you’ve arrived at the correct spot! This article will answer all your questions and diminish the strain of discovering a really efficient arrangement. 

Upon a complete walkthrough of this article, you will gain a decent understanding of two of the most efficient methods that can be leveraged for Databricks Connect to SQL Server. Read along to learn more about the steps required for setting up Databricks Connect to SQL Server.

Methods for Building Databricks Connect to SQL Server

There are multiple ways to set up Databricks Connect to SQL Server, but we have hand-picked two of the easiest methods to do so:

Method 1: Connecting SQL Server to Databricks using Hevo Data

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you transfer data from Microsoft SQL Server, Azure SQL Database, and even your SQL Server Database on Google Cloud to Databricks & lets you visualize it in a BI tool.

Get Started with Hevo for free

To connect Hevo to SQL Server, follow these steps:

Step 1: Configure SQL Server as the Source

  • PIPELINES can be selected through the Navigation Bar.
  • Then, under the Pipelines List View, select the + CREATE button.
  • From the Select  Source Type page, pick the SQL Server variant.
  • Provide a unique name for your destination and your database credentials (password and username) to connect Hevo Data to Microsoft SQL Server. You will also need to supply details like the host IP, port number, and database name and schema in order for Hevo to be able to connect to your Microsoft SQL Server database.
  • On TEST CONNECTION, click. Once all required fields have been filled out, this button becomes active. Your supplied connection settings are verified by Hevo’s built-in connectivity checker.
  • To proceed with configuring the destination, click TEST & CONTINUE. Once all required fields have been filled out, this button becomes active.

Step 2: Configure Databricks as your Destinations

  • Register for Hevo or access your existing Hevo account. You are sent to the Configure your Databricks Destination page after logging in.
  • Enter the Destination Name and Schema Name on the Configure your Databricks Destination page.
  • On TEST CONNECTION, click. When all of the required fields have been filled out, this button becomes active.
  • Input SAVE & CONTINUE here. When all of the required fields have been filled out, this button becomes active. 

Check Out What Makes Hevo Amazing:

  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.

Method 2: Using Custom Code to Connect Databricks to SQL Server

Follow the steps below to set up Databricks Connect to SQL Server by writing custom ETL Scripts. This will be an easy six-step process that begins with creating an SQL Server Database on Azure. In case you already have a SQL Server Database deployed either locally or on other Cloud Platforms such as Google Cloud, you can directly jump to Step 4 to connect your database.

Note: Here, we are using a Databricks set up deployed on Azure for tutorial purposes. However, you can apply the same procedure for connecting an SQL Server Database with Databricks deployed on other Cloud’s such as AWS and GCP. The process and drivers involved remain universal.

Step 1: Create a New SQL Database

  • Login to the Microsoft Azure portal using the appropriate credentials.
  • You can either click directly on the Create a Resource button or navigate to the left side vertical menu bar and select SQL Databases to get redirected to the Create SQL Database Screen.
Databricks Connect to SQL Server - Create a Resource
Image Source
  • Fill in the required fields like Database Name and choose your desired server. Once you are done doing that, click on the Review+Create button to create your SQL Database on Azure.
Databricks Connect to SQL Server - Creating SQL Database
Image Source

Step 2: Upload the Desired File to Databricks Cluster

  • Navigate to the left side menu bar on your Azure Databricks Portal and click on the Data Icon.
  • Now, click on the Add Data button.
Databricks Connect to SQL Server - Adding Data
Image Source
  • Browse the file that you wish to upload to the Azure Databrick Cluster and then click on the Select button to upload the file.
Uploading File for Databricks Connect to SQL Server
Image Source

Step 3: Load the Data

  • Click on the Workspace icon present on the left side of the home page.
  • Click on the Create option and then select Notebook to write your own custom code.
Databricks Connect to SQL Server 
- Workspace
Image Source
  • Now, provide a unique name to the Notebook and select Scala as the preferred language in which you wish to write the custom code.
  • By default, the Cluster’s name is pre-populated if you are working with a single cluster. In case you have created multiple clusters, you can select the desired cluster from the drop-down menu.
  • Finally, click on the Create button.
Databricks Connect to SQL Server - Creating new Notebook
Image Source

 Step 4: Create the JDBC URL and Properties 

  • Firstly, you need to create a JDBC URL that will contain information associated with either your Local SQL Server deployment or the SQL Database on Azure or any other Cloud platform. You further need to add other details such as Port Number, User, and Password. You can refer to the following piece of code to do so:
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"
You can replace the Hostname, Port and Database parameters as per your deployment. Once you’ve added your DB specific details, your URL string will look like this:

Example: val url = "jdbc:sqlserver://azsqlhevoserver.database.windows.net:1433;database=azsqlhevodb;user=Rakesh;password=*******"
  • Now it’s time to create the properties or functions to link the parameters.
import java.util.Properties
val myproperties = new Properties()
myproperties.put("user", "Rakesh")
myproperties.put("password", "******")

Step 5: Check the Connectivity to the SQL Server Database

The code given below will help you in checking the connectivity to the SQL Server database:

val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
myproperties.setProperty("Driver", driverClass)

Step 6: Read & Display the Data

  • To read the content of the file that you uploaded in the previous step, you can create a Data Frame. The columns in the first row of the file will be used as the Data Frame’s column names if header = true is set. You can specify Spark to automatically infer the schema of each column by setting inferSchema = true.
val mydf = spark.read.format("csv")
    .option("header","true")
    .option("inferSchema", "true")
          .load("File")
  • Lastly, to display the data, you can simply use the display function:
  display(mydf)

Once you follow all the above steps in the correct sequence, you will be able to build Databricks Connect to SQL Server.

Limitations of Writing Custom Code to Set up Databricks Connect to SQL Server

There are a few limitations of using Manual ETL Scripts to Connect Datascripts to SQL Server. Some of them are listed below:

  • Manually writing ETL Scripts requires significant technical bandwidth.
  • If you need the data to be transferred in real-time, writing custom scripts to accomplish this can be tricky, as it can lead to a compromise in Data Accuracy and Consistency.
  • This approach is suitable for a one-time bulk insert. In most cases, you will need to execute a continuous load process to ensure that the destination always receives the latest data. In that case, you will need logic to handle the duplicate data in real-time.
  • You will be required to maintain and add to the custom scripts when you scale this solution. This can consume a lot of time and could result in errors and inconsistencies in the data.
  • During the implementation of this custom solution, you may not have technical assistance. If a problem is encountered, you may have limited help and resources.

Use Cases of Azure Databricks SQL Server Integration

  • Big Data Analytics: Azure Databricks enables data engineering and data science teams to process large datasets using Spark. On the other hand, SQL plays can streamline the process of gaining insights from huge volumes of data by using a variety of conditional commands. Thus, Azure Databricks SQL Server integration makes Big Data Analytics efficient. 
  • Healthcare: Healthcare organizations use SQL to analyze large data sets containing information about patient behavior, medical conditions, and demographics. It facilitates medical professionals to gain critical insight, which contributes toward developing specific solutions for solving problems. Integrating Azure Databricks with SQL Server can optimize the functioning of the healthcare sector. 
  • Social Media: Social media networks process significant data every second. To store users’ profile information, use SQL on platforms such as Snapchat and Instagram. When users create new posts or share photos, the Structured Query Language allows them to update the app database. It also enables the recording of messages which users can retrieve later.

Conclusion

This blog introduced you to two methods that can be used to set up Databricks Connect to SQL Server. It gives comprehensive information on how to connect Databricks to SQL Server. It also briefed you about SQL Server and Databricks along with their features. If you want to integrate data from various data sources such as SQL Server into your desired Database/destination like Databricks and seamlessly visualize it in a BI tool of your choice, Hevo Data is the right choice for you! It will help simplify the ETL and management process of data sources and destinations.

Visit our Website to Explore Hevo

Hevo Data provides users with a simpler platform for integrating data from 150+ Data Sources like SQL Server to Databricks for Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouse, Database, or a destination of your choice. It also provides a consistent and reliable solution to manage data in real-time, ensuring that you always have Analysis-ready data in your desired destination.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

Share your preferred approach for setting up Databricks Connect to SQL Server. Let us know in the comments below!

mm
Former Research Analyst, Hevo Data

Rakesh is a research analyst at Hevo Data with more than three years of experience in the field. He specializes in technologies, including API integration and machine learning. The combination of technical skills and a flair for writing brought him to the field of writing on highly complex topics. He has written numerous articles on a variety of data engineering topics, such as data integration, data analytics, and data management. He enjoys simplifying difficult subjects to help data practitioners with their doubts related to data engineering.

No-Code Data Pipeline for Databricks