Connect Kafka to Salesforce: A Comprehensive Guide

on Data Integration, Data Warehouse, ETL, Tutorials • August 17th, 2020 • Write for Hevo

Kafka to Salesforce

Do you want to transfer your Salesforce data using Kafka? Are you finding it difficult to connect Kafka to Salesforce? Well, look no further! This article will answer all your queries & relieve you of the stress of finding a truly efficient solution. Follow our easy step-by-step solution to help you master the skill to efficiently transferring your data from Salesforce using Kafka.

It will help you take charge in a hassle-free way without compromising efficiency. This method is aimed at making the data export process as smooth as possible.

Upon a complete walkthrough of the content, you will able to successfully connect Kafka to Salesforce and seamlessly transfer data to the destination of your choice for a fruitful analysis in real-time. It will further help you build a customized ETL pipeline for your organization. Through this article, you will get a deep understanding of the tools and techniques being mentioned & thus it will help you hone your skills further.

Table of Contents

Introduction to Kafka

Kafka Logo

Apache Kafka is an open-source distributed software that allows a real-time transfer of data from one location to another. Kafka, written in Scala, makes use of the Broker’s concept to transfer data in a fault-tolerant manner as per the requirement and subject-wise. Kafka provides a high-throughput and low-latency distributed commit log system and a robust queue that can handle a high volume of data.

Key features of Kafka:

  • Scalability: Kafka has exceptional scalability and can be scaled easily without downtime.
  • Data Transformation: Kafka offers KStream and KSQL (in case of Confluent Kafka) for on the fly data transformation.
  • Fault-Tolerant: Kafka uses Brokers to replicate data and persists the data to make it a fault-tolerant system.
  • Security: Kafka can be combined with various security measures like Kerberos to stream data securely.
  • Performance: Kafka is distributed and partitioned and has very high throughput for publishing and subscribing to the message.

For further information on Kafka, you can check the official website here.

Introduction to Salesforce

Salesforce Logo

Salesforce is a cloud-based CRM tool that helps you maintain and manage your organization’s interactions with its customer base. Salesforce generates a lot of data from managing these interactions. It also offers cloud-based tools such as data analytics, IoT products. These generate data that provide priceless insights about customers and can be extremely useful for the organisation.

Key features of Salesforce:

  • Contact Management: Salesforce offers smooth contact-management by providing access to critical customer data and interaction history. It provides great insights on how to use customer data to get a better understanding of their behaviour using various trends and metrics and formulate strategies.
  • Dynamic Dashboards: Salesforce’s interactive dashboards provide a complete view of how well the business is performing using key factors such as market trends, customer behaviour, etc. You can easily create dashboards and generate real-time reports for your business.
  • Opportunity Management: It is one of the best features of Salesforce. It provides you with an in-depth view of the customers’ timeline, their buying patterns, metrics, etc and lets you strategise your next move.
  • Email Integrations: Salesforce supports full integration with applications like Microsoft Outlook, Gmail, etc and lets you synchronise your calendars and schedules. It even provides offline access to important emails and lets you develop personalised templates for potential customers.

For further information on Salesforce, you can check the official site here.

Simplify your data analysis with Hevo’s No-code Data Pipelines

Hevo Data, a No-code Data Pipeline helps to transfer data from 100+ sources (Including 40+ Free Data Sources like Salesforce) to any destination of your choice to visualize it in your desired BI tool for free. Hevo is fully-managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. It allows you to focus on key business needs and perform insightful analysis using BI tools. 

Get Started with Hevo for Free

Check out what makes Hevo amazing:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

Prerequisites

  • Working knowledge of Salesforce.
  • Working knowledge of Kafka.
  • A general idea about Apex programming language.
  • A general idea about databases and their operations.

Using Streaming APIs to connect Kafka to Salesforce

Salesforce Streaming APIs allow Kafka to capture the real-time events from Salesforce via an HTTP connection. The data is available in the form of a Salesforce Object (sObject). Users can easily modify the data by performing various operations such as deleting, updating, and creating new events. You can use a query written in Salesforce Object Query Language (SOQL) to retrieve information about various data events as per the need.

This method can be implemented using the following steps:

Step 1: Creating a PushTopic record in Salesforce

The easiest way to create a new PushTopic record is by using the Salesforce Developer Console, an integrated development environment that you can use to create, debug, and test applications in Salesforce.
From the debug menu, select open execute anonymous window in the Developers Console and paste the following Apex code to create a PushTopic record:

PushTopic pushTopic = new PushTopic();
pushTopic.Name = 'ContactUpdates';
pushTopic.Query = 'SELECT Id, Name FROM Contact';
pushTopic.ApiVersion = 36.0;
pushTopic.NotifyForOperationCreate = true;
pushTopic.NotifyForOperationUpdate = true;
pushTopic.NotifyForOperationUndelete = true;
pushTopic.NotifyForOperationDelete = true;
pushTopic.NotifyForFields = 'Referenced';
insert pushTopic;

Using the above Apex code, you will be able to create a PushTopic called ContantUpadates, that will create a new event every time a user, creates or modifies or deletes a contact. The PushTopic will send the Id and name of the contact the user is trying to access.

Step 2: Installing the Kafka connector for Salesforce

To install the Kafka connector for Salesforce, start your Kafka server and use the following command:

sudo npm install -g salesforce-kafka-connect

This command will download and install the Kafka connector for Salesforce on your system.

Step 3: Configuring the Salesforce Streaming Events

To configure the Kafka connector, begin by using the following command:

sudo nano /usr/lib/node_modules/salesforce-kafka-connect/config/default.js

The configuration file will now open up as follows. Replace the username and password fields with your credentials to allows Kafka to access the data associated with your account.

"use strict";
const path = require("path");
const config = {
    kafka: {
        zkConStr: "localhost:2181/",
        logger: null,
        groupId: "kc-salesforce-group",
        clientName: "kc-salesforce-client",
        workerPerPartition: 1,
        options: {
            sessionTimeout: 8000,
            protocol: ["roundrobin"],
            fromOffset: "earliest", //latest
            fetchMaxBytes: 1024 * 100,
            fetchMinBytes: 1,
            fetchMaxWaitMs: 10,
            heartbeatInterval: 250,
            retryMinTimeout: 250,
            autoCommit: true,
            autoCommitIntervalMs: 1000,
            requireAcks: 1,
            //ackTimeoutMs: 100,
            //partitionerType: 3
        }
    },
    topic: "sf-test-topic",
    partitions: 1,
    maxTasks: 1,
    maxPollCount: 5,
    pollInterval: 250,
    produceKeyed: true,
    produceCompressionType: 0,
    connector: {
        username: "user",
        password: "password",
        loginUrl: "https://user.salesforce.com",
        streamingSource: {
            batchSize: 5,
            topic: "StreamingTopic",
            kafkaTopic: "sf-test-topic",
            idProperty: "id"
        },
        restSink: {
            sObject: "sobject",
            idProperty: "id"
            batchSize: 500
        }
    },
    http: {
        port: 3149,
        middlewares: []
    },
    enableMetrics: true
};
module.exports = config;

Perform the same operation and replace the username and password parameters with your credentials in test-config and source-config files. You can access the files by using the following file paths:

/usr/lib/node_modules/salesforce-kafka-connect/test/sink-config.js
/usr/lib/node_modules/salesforce-kafka-connect/test/source-config.js

Once you have configured your Salesforce data source to create events whenever the Salesforce Streaming APIs produces a request, you now need to set up a sink, that will retrieve the data from Salesforce and transfer it to Kafka. You can make use of the Salesforce-Kafka-Connect package to implement this. You can open the index.js using the following file path:

/usr/lib/node_modules/salesforce-kafka-connect/index.js

The streaming of Salesforce data into Kafka is handled by the runSourceConnector function.

Function to run source connector.

Once you have called the function, you need to transfer data from Salesforce, use the following command in your command-line interface:

nkc-salesforce-source --help

This is how you can configure the Salesforce Streaming Events and facilitate the process of connecting Kafka to Salesforce.

Step 4: Configuring the ETL Pipeline for connecting Kafka to Salesforce

To fetch data from the Salesforce Streaming APIs and store it into Kafka and then transform the data into an outgoing Salesforce PoducerRecord, you need to set up the Kafka sink. The runSinkConnector function can help you performs this operation.

Function to run sink connector.

You can use the following command in your command-line interface to call this function:

nkc-salesforce-sink --help

Once you have called the function, you now need to transfer your Salesforce data into the Kafka sink. To do this, you can create a function called messageToProducerRecord as follows:

Passing message from Kafka to Salesforce.

This is how you can use the Salesforce Streaming APIs to connect Kafka to Salesforce and transfer your data in real-time.

Conclusion

This article teaches you how to connect Kafka to Salesforce. It also provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently. This method, however, can be challenging especially for a beginner & this is where Hevo saves the day. Hevo Data, a No-code Data Pipeline helps you transfer data from various sources like Salesforce (Data Source Available for Free in Hevo) in a fully-automated and secure manner without having to write the code repeatedly. Hevo with its strong integration with 100+ sources & BI tools, allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiffy.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for the 14-day free trial! and experience the feature-rich Hevo suite first hand.

Tell us about your experience of connecting Kafka to Salesforce! Share your thoughts with us in the comments section below.

No-code Data Pipeline For Salesforce