Marc Benioff, Parker Harris, Dave Moellenhoff, and Frank Dominguez founded Salesforce.com in 1999, and it has grown to become one of the most dominant software companies globally.

Salesforce is a cloud-based Customer Relationship Management (CRM) platform that offers a comprehensive CRM solution to businesses without the need for extensive development.

Databricks, founded by the creators of Apache Spark, is a cloud-based data engineering platform used to store, transform, and visualize large amounts of data seamlessly.

Integrating Databricks with Salesforce ensures lead and prospect data is automatically synchronized, providing executives with up-to-date information and context for engaging with leads and customers.

What is Databricks?

Databricks Salesforce: Databricks Logo | Hevo Data
Image Source

Databricks is a popular Cloud-based Data Engineering platform developed by the creators of Apache Spark. It deals with large amounts of data and allows you to easily extract valuable insights from it. With the main focus on Big Data and Analytics, it also assists you in the development of AI (Artificial Intelligence) and ML (Machine Learning) solutions. Machine Learning libraries such as Tensorflow, Pytorch, and others can be used for training and developing Machine Learning models.

Key Features of Databricks

Databricks comprises a variety of features that help users work more efficiently on the Machine Learning Lifecycle. Some of the key features of Databricks include:

  • Interactive Notebooks: Databricks’ interactive notebooks provide users with a variety of languages (such as Python, Scala, R, and SQL) and tools for accessing, analyzing, and extracting new insights.
  • Integrations: Databricks can be easily integrated with a variety of tools and IDEs (Integrated Development Environment), including PyCharm, IntelliJ, Visual Studio Code, etc., to make Data Pipelining more structured.
  • Delta Lake: Databricks houses an open-source Transactional Storage layer that can be used for the whole data lifecycle. This layer brings data scalability and reliability to your existing Data Lake.

What is Salesforce?

Databricks Salesforce: Salesforce Logo | Hevo Data
Image Source

For plenty of Entrepreneurs, Business Owners, and Corporations, Salesforce is the king of CRM. Salesforce is the most popular and robust Cloud-Based CRM software designed to support organizations in managing their Sales and Marketing data.

Salesforce will help you to accomplish several Marketing Goals by storing and keeping track of all your Customer Data, Contact Data, and Marketing Leads. You can also generate Sales Forecast Reports with Salesforce to convert your leads.

Salesforce also supports Email Integration with applications like Microsoft Outlook, Gmail, etc. They really have just about everything that you could possibly think of when it comes to operations in a business and managing their customers. 

Salesforce follows a subscription-based model and offers a variety of pricing options, ranging from $25 to about $300 per user every month.

Key Features of Snowflake

Let’s look at some of the key features of Salesforce responsible for its immense popularity:

  • Accounting Management: Salesforce gives companies a complete picture of their customers. At any moment, customers can access Activity Logs, Customer Conversations, Contacts, Internal Account Discussions, and other data.
  • Lead Management: Salesforce helps firms track leads and optimize campaign performance across all marketing platforms. As a result, they will be more equipped to make decisions about how and where to spend their marketing budget.
  • Analytics and Forecasting: Salesforce comprises customizable dashboards that display key performance metrics and reporting capabilities for your business. These analytics can assist you in making more informed business decisions.

You can think of combining your Salesforce data with data coming from various other sources in your Data Warehouse to get valuable insights for your business.

Why Migrate from Databricks to Salesforce?

It is not an unusual sight to have multiple disparate sources of data from the Product, Marketing, and Sales departments in a company’s Data Infrastructure. Many organizations deploy a central repository like Databricks to unify and manage all the Data Sources at a single location.

A Databricks Salesforce connection allows you to further bring all your customer data from your Data Repository into the hands of your Sales & Support Teams.

You can add incredible value to your organization after successfully establishing the Databricks Salesforce integration. Here’s what you’ll get after connecting Databricks and Salesforce.

  • A Databricks Salesforce connection offers the Sales Teams rich first-party data helping them close new business opportunities.
  • It allows the Marketing Teams to leverage the insightful customer data to create customized, high-quality Email drip campaigns.
  • Product Teams can use valuable data to understand and enhance the customer experience.
  • It also helps in tracking and monitoring the progress of tasks by pushing product data for the Account Managers to know what actions are being taken in the app.
  • A Databricks Salesforce connection reduces churn by synchronizing health scores and churn events to Salesforce CRM for the Account Managers to track.

Now that you’re familiar with Databricks and Salesforce, let’s dive straight into Databricks Salesforce connection.

Databricks Salesforce Integration

Getting data into a Databricks is a fairly simple task. The difficult and less publicized part is getting data out of Databricks and loading it into operational tools like Salesforce.

This article on Databricks Salesforce integration will focus on exporting CSV data from Databricks and loading it into Salesforce using Databricks CLI and Salesforce Data Wizard respectively. 

Follow the below-mentioned steps to achieve a Databricks Salesforce migration successfully.

Step 1: Use Databricks CLI to Export CSV

Databricks provides users with a CLI (Command-Line Interface) to interact with Databricks Clusters. You can easily access your CSV file in the Databricks File System (DBFS) using the Databricks CLI and export it to your desired location.

You can use the cp command to export the selected CSV files from the DBFS into your local storage.

Run the following command to call the Databricks File System command:

databricks fs

Now, you need the cp command to copy the file. You can use options for additional configuration.

 cp         copies files to or from DBFS
    Options:
      -r, --recursive
      --overwrite

Run the cp command as shown below:

databricks fs cp dbfs:/myfolder/extract.csv ./extract.txt

This command copies the extracted CSV file from the DBFS folder and stores it under the current directory. 

You can add the -r option if your use case requires you to do this recursively for an entire folder.

databricks fs cp -r dbfs:/myfolder .

Note: The dot “.” in the end represents the current local folder.

Step 2: Use Salesforce Data Wizard to Import CSV

Now that you’ve successfully exported the Databricks CSV file, it’s time to import the CSV into Salesforce to successfully accomplish the Databricks Salesforce connection. Data Import Wizard is Salesforce’s native feature that allows users to easily import up to 50,000 records at a time. Follow the steps below mentioned steps to upload your CSV file.

  • Go to “Setup” in Salesforce and search for “Data Import Wizard” in the “Quick Find” bar.
Databricks Salesforce: Data Import Wizard
Image Source
  • Click on “Data Import Wizard”.
  • Go through the prompt information and click on “Launch Wizard”.
  • Now, you’ll be prompted to select the type of data that you’re importing. You can either choose “Standard Objects” to import accounts, leads, contacts, solutions, articles, etc or “Custom Objects” to import custom data.
  • Next, decide your type of import: “Add new records”, “Update existing records”, or “Add new and update existing records”.
Databricks Salesforce: Import Data
Image Source
  • Fill in the rest of the required fields depending on your use case. 
  • You will then be prompted to upload your CSV file.
  • You can select a Comma or Tab for the “Values Separated By”.
  • Once you’re done with the configuration, click on “Next”.
Databricks Salesforce: Import Configuration
Image Source
  • Now, for Data Consistency, you need to map the fields between your source CSV data and target data.
  • Review the mapping and start your import.
Databricks Salesforce: Import Status
Image Source

That’s it, with some basic knowledge of Datbricks CLI, you can easily establish a Databricks Salesforce connection. However, this method has its own limitations, not the least of which is the time it takes every time you need to export and load a CSV.

You can opt for a third-party solution if you don’t want to spend a lot of time importing CSV and resolving data issues.

Conclusion

Databricks Salesforce integration is a good option to unleash the full potential of your customer data. Having all your customer data in Salesforce allows your Sales & Support Teams to make the most of it by studying and understanding the customer patterns, context, and trends.

This article introduced you to Databricks and Salesforce and later provided you with a step-by-step guide to establishing a Databricks Salesforce connection. 

If you are an advanced user of Salesforce, you are most probably dealing with a lot of Data Sources, both internally and from other Software-as-a-Service (SaaS) offerings.

Raj Verma
Business Analyst, Hevo Data

Raj, a data analyst with a knack for storytelling, empowers businesses with actionable insights. His experience, from Research Analyst at Hevo to Senior Executive at Disney+ Hotstar, translates complex marketing data into strategies that drive growth. Raj's Master's degree in Design Engineering fuels his problem-solving approach to data analysis.

No-code Data Pipeline For Salesforce Data Integration