Press "Enter" to skip to content

Salesforce ETL: Steps and Challenges

Salesforce is the most widely used cloud-based CRM platform. It stores all your customer data, contact data, marketing leads, and more. Combining this with data from many other sources in your data warehouse can deliver valuable insights for your business. The first step, however, would be to get all the data from Salesforce into a warehouse of your choice. This blog post aims to help you discover various approaches to perform Salesforce ETL. It also highlights the limitations and challenges that might come your way. The objective is to share enough knowledge to help you choose the best approach for your use case.

Methods to move data from Salesforce to Data Warehouse

Data can be copied from Salesforce to Redshift in either of two ways

Method 1: Write custom ETL scripts to move data

The custom scripting approach holds the promise of control and flexibility. A bunch of talented developers can write scripts to get data on-the-fly. This, however, comes with certain risks that are discussed more in detail later.

Method 2: Use Hevo Data – a fully managed, automated Salesforce ETL Tool

With Hevo, Salesforce ETL becomes a cake walk. Hevo is fully automated so you do not have to write custom scripts or concern yourself code maintenance. Hevo moves data from Salesforce to any Data Warehouse like Redshift, BigQuery or Snowflake with 100% reliability and consistency. This will rid you from investing time and resources in overseeing data quality.

Let’s look at these two methods in greater detail.

Method 1: Copying your data from Salesforce to your data warehouse using custom scripts

Your Salesforce data is accessed using Salesforce’s vast range of APIs. The APIs needed by you will be determined by the use cases that you are solving for. The broad steps are as follows:

  1. Identify your data:
    • Start by familiarizing yourself with Salesforce’s developer guide.
    • Then take a deep dive into Salesforce’s universe of APIs. The most succinct description of these is found here in Salesforce documentation
  2. Pull your data:
    Let’s say you choose to work the REST API. You can use tools like CURL or Postman to fetch your data. The server will typically respond with a list of available resources in JSON or XML, depending on what you asked for. The Salesforce REST API supports a language called Salesforce Object Query Language (SOQL) for executing arbitrarily complex queries. Again, the result can be either in JSON or XML. More information on SOQL can be found on the Salesforce SOQL Reference pages.Alternatively, you may decide to work with the SOAP API, in which case you first create a SOAP client, then set up a Web Services Connector to communicate with Salesforce.

    You may also want to use Salesforce’s streaming API, so your data is always current. This powerful streaming mechanism is not without constraints. For further reading, Salesforce documentation on Streaming API has all the information you need.

    Whichever route you choose, we recommend using JSON, as most popular data warehousing solutions natively support it. When you have decided on the API or APIs you will need to:

    • Select the data you need to extract from the API
    • Set up a data warehouse repository and map the Salesforce data to this
    • Transform your data into the repository
    • Load your transformed data into your warehouse
  3. Data Staging
    If you are using Google BigQuery as your data warehouse, you may decide to use Google Cloud Storage as your repository or staging area for loading your data into a warehouse. Alternatively, if you are using Amazon Redshift or Snowflake you may opt for Amazon S3 as your data repository.
  4. Data Type Mapping
    You need to make sure the data types returned by the API call are supported by your data warehouse. You can learn about data transformation and mapping for the current big names in cloud-based warehouses using these links:
    Google BigQuery
    Amazon Redshift
    Snowflake
    PostgreSQL
  5. Data Warehouse Schema
    You need to design the schema for your Data Warehouse (Redshift, Snowflake, or BigQuery) and map the data from your data source for maximum performance.
  6. Loading your data
    How you do this depends on which data bucket or repository you use and your data warehouse. For instance, if you are using Google BigQuery, you can use either BigQuery’s command-line tool or any cloud SDK. If you are using Amazon S3 you will need to run a COPY command to get data into your warehouse.

This overview of getting your data from Salesforce into a cloud-based warehouse is just a cursory look at the primary steps involved. 

Downsides and pitfalls of Custom Coding Salesforce ETL process

Administrators and engineers write scripts to solve problems. simplify processes and improve performance. Unfortunately, this often evolves into an over-reliance on scripting. Your entire scripting universe can grow into an unsustainable burden.

Think about the following Salesforce ETL issues and concerns:

  1. Error Handling: Glaring errors that are found in your scripts get fixed early, but there are more insidious errors which often are not discovered until a process is well underway. What do you do when an error causes part of or even your entire data flow to stop?
  2. Real-time Salesforce ETL: The steps mentioned above only help you load the data one-time or periodically. The process to get your Salesforce data to your warehouse is cumbersome and effort intensive.
  3. Schema Changes: What is your procedure for updating schema at the warehouse in response to changes in the Salesforce API?
  4. Source/Destination Unreachable: How do you plan to prevent data loss if Salesforce or your Data Warehouse is not reachable? 
  5. Documentation: Are your scripts documented and catalogued? When engineering resources move in and out of the organization, this becomes a key aspect for the organization.
  6. Addition Data Requests: Would you have ready engineering resources to modify and update the scripts when there is a request for more data? Given this data is super critical for your business teams, it would be smart to account for update requests.

All the above limitations can be handled by opting for a Salesforce ETL platform that works out of the box. Here is how Hevo can help:

Method 2: Copying your data from Salesforce to your Data Warehouse using Hevo

 With Hevo, Salesforce data can be moved to any warehouse in 2 simple steps:

  1. Connect your Salesforce account by authorizing it.
    Configuring Source on Hevo - Salesforce ETL tool
  2. Select and authenticate the Warehouse to which you want to move this data

Advantages of using Hevo

  1. Hassle and code free ETL – No scripts or cron jobs are needed
  2. Use Hevo’s AI-powered architecture to stream your Salesforce data into your cloud-based warehouse in minutes. Zero setup time.
  3. Hevo auto-detects your Salesforce schema and maps the data to your warehouse
  4. When an API changes at Salesforce’s end, Hevo detects the change and updates the warehouse schema accordingly, without any intervention needed from your end.
  5. Hevo’s real-time monitoring and alerts give status updates for data replication, detected schema changes, and more. This way, you are always up-to-date about your Salesforce ETL
  6. 24×7 Slack and email support to ensure that you do not face any glitches in getting your Salesforce Data to your warehouse.

Salesforce ETL tool like Hevo is key to enabling data agility, performance, and reliability in an organization. Hevo’s fault tolerant, dependable data integration platform gives you a trouble-free environment in which you can move data from any source to any destination. (www.hevodata.com/integrations).

Explore a hassle-free Salesforce ETL signing up for a 14-day free trial

ETL Data to Redshift, Bigquery, Snowflake

Move Data from any Source to Warehouse in Real-time

Sign up today to get $500 Free Credits to try Hevo!
Start Free Trial