Salesforce provides customer relationship management software as a subscription-based cloud service, thereby eliminating the need for organizations to have their own infrastructure and software deployment. Even though Salesforce provides an analytics suite along with its offerings, most organizations will need to combine their customer data from Salesforce to data elements from various internal and external sources for decision making. This can only be done by importing Salesforce data into a data warehouse or database. In this post, we will look at the steps involved in loading data from Salesforce to PostgreSQL, one of the most famous relational databases.
Two approaches to move data from Salesforce to PostgreSQL
There are two broad methods that can help you load data from Salesforce to PostgreSQL:
Method 1: Writing custom ETL scripts
Method 2: Implementing a Fully-Managed Data Integration solution, Hevo Data
This blog describes method 1 in great detail. It also highlights the challenges involved in implementing this approach. Towards the end, you will also find more details on Method 2 so that you explore all your options and choose wisely.
Loading Data from Salesforce to PostgreSQL using Custom Code:
Interacting with Salesforce
The best way to interact with Salesforce is to use the different APIs provided by Salesforce itself. It also provides some utilities to deal with the data. The following section attempts to provide an overview of these APIs and utilities.
Salesforce REST APIs
Salesforce REST APIs are a set of web services that help to insert/delete, update and query Salesforce objects. To implement a custom application using Salesforce in mobile or web ecosystem, these REST APIs are the preferred method.
Salesforce SOAP APIs
SOAP APIs can establish formal contracts of API behaviour through the use of WSDL. Typically Salesforce SOAP APIs are when there is a requirement for stateful APS or in case of strict transactional reliability requirement. SOAP APIs are also sometimes used when the organization’s legacy applications mandate the protocol to be SOAP.
Salesforce BULK APIs
Salesforce BULK APIs are optimized for dealing with a large amount of data ranging up to GBs. These APIs can run in a batch mode and can work asynchronously. They provide facilities for checking the status of batch runs and retrieving the results as large text files. BULK APIs can insert, update, delete or query records just like the other two types of APIs.
Salesforce Data Loader
Data Loader is a Salesforce utility that can be installed on the desktop computer. It has functionalities to query and export the data to CSV files. Internally this is accomplished using the bulk APIs.
Loading data from Salesforce to PostgreSQL – Step by Step
Login to Salesforce using the SOAP API and get the session id. For logging in first create an XML file named login.txt in the below format.
<?xml version="1.0" encoding="utf-8" ?> <env:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"> <env:Body> <n1:login xmlns:n1="urn:partner.soap.sforce.com"> <n1:username>username</n1:username> <n1:password>password</n1:password> </n1:login> </env:Body> </env:Envelope>
Execute the below command to login
curl https://login.Salesforce.com/services/Soap/u/47.0 -H "Content-Type: text/xml; charset=UTF-8" -H "SOAPAction: login" -d @login.txt
From the result XML, note the session id. We will need the session id for the later requests.
Create a BULK API job. For creating a job, a text file with details of the objects that are to be accessed is needed. Create the text file using the below template.
<?xml version="1.0" encoding="UTF-8"?> <jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload"> <operation>insert</operation> <object>Contact</object> <contentType>CSV</contentType> </jobInfo>
We are attempting to pull data from the object Contact in this exercise.
Execute the below command after creating the job.txt
curl https://instance.Salesforce.com/services/async/47.0/job -H "X-SFDC-Session: sessionId" -H "Content-Type: application/xml; charset=UTF-8" -d @job.txt
From the result, note the job id. This job-id will be used to form the URL for subsequent requests. Please note the URL will change according to the URL of the user’s Salesforce organization.
Create the SQL query to pull the data and use it with CURL as given below.
curl https://instance_name—api.Salesforce.com/services/async/APIversion/job/jobid/batch -H "X-SFDC-Session: sessionId" -H "Content-Type: text/csv; SELECT name,desc from Contact
The next step is to close the job. This requires a text file with details of the job status change. Create it as below with the name close_job.txt.
<?xml version="1.0" encoding="UTF-8"?> <jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload"> <state>Closed</state> </jobInfo>
Use the file with the below command.
curl https://instance.Salesforce.com/services/async/47.0/job/jobId -H "X-SFDC-Session: sessionId" -H "Content-Type: application/xml; charset=UTF-8" -d @close_job.txt
Access the resulting API and fetch the result is of the batch.
curl -H "X-SFDC-Session: sessionId" https://instance.Salesforce.com/services/async/47.0/job/jobId/batch/batchId/result
Retrieve the actual results using the result id that was fetched from the above step.
curl -H "X-SFDC-Session: sessionId" https://instance.Salesforce.com/services/async/47.0/job/jobId/batch/batchId/result/resultId
The output will be a CSV file with the required rows of data. Save it as Contacts.csv in your local filesystem.
Load data to Postgres using the COPY command. Assuming the table is already created this can be done by executing the below command.
COPY Contacts(name,desc,) FROM 'contacts.csv' DELIMITER ',' CSV HEADER;
An alternative to using the above sequence of API calls is to use the Data Loader utility to query the data and export it to CSV. But in case you need to do this programmatically, Data Loader utility will be of little help.
Limitations of the Above Approach
As evident from the above steps, loading data through the manual method contains a significant number of steps that could be overwhelming if you are looking to do this on a regular basis. You would need to configure additional scripts in case you need to bring data into real-time.
A Simpler Alternative to Sync Data from Salesforce to PostgreSQL
An easier way to accomplish the same result is to use a code-free data pipeline platform like Hevo Data that can implement a sync in a couple of clicks.
Hevo does all heavy-weightlifting and masks all the data migration complexities to securely and reliably deliver the data into your PostgreSQL database in real-time. By providing analysis-ready data in PostgreSQL, Hevo helps you stop worrying about your data and start uncovering insights in real-time.
You can sign up for a 14-day free trial to experience the simplicity yourself.
With Hevo, you could move data in just 2 steps:
- Authenticate and configure your Salesforce data source
- Configure your PostgreSQL destination where the data needs to be loaded
Hevo takes care of handling everything from schema management to data flow monitoring data rids you of any maintenance overhead. In addition to Salesforce, you can bring data from 100s of different sources into PostgreSQL in real-time, ensuring that all your data is available for analysis with you.
What are your thoughts on the two approaches to move data from Salesforce to PostgreSQL? Let us know in the comments.