Building an entirely new data connector is difficult, especially when you’re already heavily swamped with monitoring and maintaining your existing custom data pipelines. When you have an ad-hoc Marketo to Databricks connection request from your marketing team, you’ll have to compromise your engineering bandwidth with this manual task. We understand you are pressed for time and require a quick solution. If you need to write GET API requests, download and upload JSON files, this might be redundant if done regularly. Alternatively, you might use an automated solution for the same.

Well, you’ve landed in the right place! We’ve prepared a simple and straightforward guide to help you replicate data from Marketo to Databricks. 

How to Replicate Data From Marketo to Databricks?

To replicate data from Marketo to Databricks, you can either:

  • Use REST APIs or 
  • A no-code automated solution. 

We’ll cover replication via REST APIs next.

Replicate Data from Marketo to Databricks Using REST APIs

Marketo, being a Marketing Automation platform, stores data comprehensively in a lead database and as assets.

The objects in the lead database are:

  • Leads
  • Companies/Accounts
  • Named Accounts
  • Opportunities
  • OpportunityRoles
  • SalesPersons
  • Custom Objects
  • Activities
  • List and Program Membership

Whereas Marketo assets include the following objects:

  • Folders
  • Programs
  • Emails
  • Email Templates
  • Landing Pages
  • Landing Page Templates
  • Snippets
  • Forms
  • Tokens
  • Files

Let’s dive into the process of replicating these objects from Marketo to Databricks using REST APIs in JSON format:

Step 1: Export Data from Marketo

For users that want to export data programmatically, Marketo offers two types of REST APIs. 

  • Lead Database APIs
  • Asset APIs
  1. Lead Database APIs are used to fetch data from Marketo personal records and associated object types such as Opportunities, Companies, etc.
  2. Asset APIs are used to make marketing collateral and workflow records available to the public.

Let’s dive into the steps for exporting data from Marketo using REST APIs.

  • Visit the Admin panel in Marketo. Then select the “LaunchPoint” button to use the APIs.
  • Now, get an authorization token by creating a new service.
  • Now, select the API that corresponds to the data you require.

For example, you could use an API path such as /rest/v1/opportunities.json to get a list of opportunities. To customize the data Marketo returns, you can add a couple of alternative filters.

The following is an example of a well-formed REST URL in Marketo:,firstName,lastName

The above URL is composed of the following parts:

  • Base URL:
  • Path:  /v1/lead/
  • Resource:  318582.json
  • Query parameter:  fields=email,firstName,lastName

In this same format, you can write the REST URL to fetch data from Opportunities object.

These are the following GET request API path & resource for fetching data from the Opportunities object.

  • Get Opportunity Field by Name: /rest/v1/opportunities/schema/fields/{fieldApiName}.json
  • Get Opportunity Fields: /rest/v1/opportunities/schema/fields.json
  • Get Opportunities:  /rest/v1/opportunities.json
  • Describe Opportunity: /rest/v1/opportunities/describe.json
  • Get Opportunity Roles: /rest/v1/opportunities/roles.json
  • Describe Opportunity Role: /rest/v1/opportunities/roles/describe.json

After making a GET API request to fetch data from Marketo, the data will be returned in JSON format. An example of data returned by an Opportunities query is as follows:

         "marketoGUID":"dff23271-f996-47d7-984f-f2676861b6fa ",
         "source":"Inbound Sales Call/Email"
         "marketoGUID":"dff23271-f996-47d7-984f-f2676861b5fc ",
         "name":"Big Dog Day Care-Phase12",
         "description":"Big Dog Day Care-Phase12",

Step 2: Prepare Marketo Data

You’ll need to construct a schema for your data tables if you don’t already have one for storing the data you obtain. Then, for each value in the response, you must identify a predefined datatype (INTEGER, DATETIME, etc.) and create a table to receive it. Marketo’s documentation should list the fields and datatypes available by each endpoint.

However, you can store the API response JSON file in your local system as well.

Step 3: Import JSON Files into Databricks

  • In the Databricks UI, go to the side navigation bar. Click on the “Data” option. 
  • Now, you need to click on the “Create Table” option.
  • Then drag the required JSON files to the drop zone. Otherwise, you can browse the files in your local system and then upload them.

Once the JSON files are uploaded, your file path will look like: /FileStore/tables/<fileName>-<integer>.<fileType>

Creating table while exporting data from Marketo to Databricks
Image Source

If you click on the “Create Table with UI” button, then follow along:

  • Then select the cluster where you want to preview your table.
  • Click on the “Preview Article” button. Then, specify the table attributes such as table name, database name, file type, etc.
  • Then, select the “Create Table” button.
  • Now, the database schema and sample data will be displayed on the screen.

If you click on the “Create Table in Notebook” button, then follow along:

  • A Python notebook is created in the selected cluster.
Python Notebook in Databricks
Image Source

The above picture shows a CSV file being imported to Databricks. However, in this case, it’s a JSON file.

  • You can edit the table attributes and format using the necessary Python code. You can refer to the below image for reference.
Displaying attributes in Databricks
Image Source
  • You can also run queries on SQL in the notebook to understand the data frame and its description.
Running SQL queries in the notebook
Image Source

In this case, the name of the table is “emp_csv.” However, in our case, we can keep it as “abs_json.”

  • Now, on top of the Pandas data frame, you need to create and save your table in the default database or any other database of your choice.
Saving the table in the notebook
Image Source

In the above table, “mytestdb” is a database where we intend to save our table. 

  • After you save the table, you can click on the “Data” button in the left navigation pane and check whether the table has been saved in the database of your choice.
Checking the table in the database
Image Source

Step 4: Modify & Access the Data

  • The data now gets uploaded to Databricks. You can access the data via the Import & Explore Data section on the landing page.
For modifying and accessing the data in Databricks
Image Source
  • To modify the data, select a cluster and click on the “Preview Table” option.
  • Then, change the attributes accordingly and select the “Create Table” option.

With the above 4-step approach, you can easily replicate data from Marketo to Databricks using REST APIs. This method performs exceptionally well in the following scenarios:

  • Low-frequency Data Replication: This method is appropriate when your marketing team needs the Marketo data only once in an extended period, i.e., monthly, quarterly, yearly, or just once. 
  • Dedicated Personnel: If your organization has dedicated people who have to manually write GET API requests and download and upload JSON data, then accomplishing this task is not much of a headache.
  • Low Volume Data: It can be a tedious task to repeatedly write API requests for different objects and download & upload JSON files. Moreover, merging these JSON files from multiple departments is time-consuming if you are trying to measure the business’s overall performance. Hence, this method is optimal for replicating only a few files.

When the frequency of replicating data from Marketo increases, this process becomes highly monotonous. It adds to your misery when you have to transform the raw data every single time. With the increase in data sources, you would have to spend a significant portion of your engineering bandwidth creating new data connectors. Just imagine — building custom connectors for each source, transforming & processing the data, tracking the data flow individually, and fixing issues. Doesn’t it sound exhausting?

Instead, you should be focussing on more productive tasks. Being relegated to the role of a ‘Big Data Plumber‘ that spends their time mostly repairing and creating the data pipeline might not be the best use of your time.

To start reclaiming your valuable time, you can…

Replicate Data from Marketo to Databricks Using an Automated ETL Tool

Going all the way to write custom scripts for every new data connector request is not the most efficient and economical solution. Frequent breakages, pipeline errors, and lack of data flow monitoring make scaling such a system a nightmare.

You can streamline the Marketo to Databricks data integration process by opting for an automated tool. Here are the benefits of leveraging an automated no-code tool:

  • It allows you to focus on core engineering objectives while your business teams can jump on to reporting without any delays or data dependency on you.
  • Your sales & support teams can effortlessly enrich, filter, aggregate, and segment raw Marketo data with just a few clicks.
  • The beginner-friendly UI saves the engineering team hours of productive time lost due to tedious data preparation tasks.
  • Without coding knowledge, your analysts can seamlessly create thorough reports for various business verticals to drive better decisions. 
  • Your business teams get to work with near-real-time data with no compromise on the accuracy & consistency of the analysis. 
  • You get all your analytics-ready data in one place. With this, you can quickly measure your business performance and deep dive into your Marketo data to explore new market opportunities.

For instance, here’s how Hevo Data, a cloud-based ETL tool, makes Marketo to Databricks data replication ridiculously easy:

Step 1: Configure Marketo as a Source

Configuring Marketo as a source
Image Source

Step 2: Configure Databricks as a Destination

Configure Databricks as a destination
Image Source

All Done to Setup Your ETL Pipeline

After implementing the 2 simple steps, Hevo Data will take care of building the pipeline for replicating data from Marketo to Databricks based on the inputs given by you while configuring the source and the destination.

The pipeline will automatically replicate new and updated data from Marketo to Databricks every 3 hrs (by default). However, you can also adjust the data replication frequency as per your requirements.

Data Pipeline Frequency

Default Pipeline FrequencyMinimum Pipeline FrequencyMaximum Pipeline FrequencyCustom Frequency Range (Hrs)
3 Hrs3 Hrs48 Hrs3-48

For in-depth knowledge of how a pipeline is built & managed in Hevo Data, you can also visit the official documentation for Marketo as a source and Databricks as a destination.

You don’t need to worry about security and data loss. Hevo’s fault-tolerant architecture will stand as a solution to numerous problems. It will enrich your data and transform it into an analysis-ready form without having to write a single line of code.

By employing Hevo to simplify your data integration needs, you can leverage its salient features:

  • Reliability at Scale: With Hevo Data, you get a world-class fault-tolerant architecture that scales with zero data loss and low latency. 
  • Monitoring and Observability: Monitor pipeline health with intuitive dashboards that reveal every state of the pipeline and data flow. Bring real-time visibility into your ELT with Alerts and Activity Logs. 
  • Stay in Total Control: When automation isn’t enough, Hevo Data offers flexibility – data ingestion modes, ingestion, and load frequency, JSON parsing, destination workbench, custom schema management, and much more – for you to have total control.    
  • Auto-Schema Management: Correcting improper schema after the data is loaded into your warehouse is challenging. Hevo Data automatically maps the source schema with the destination warehouse so that you don’t face the pain of schema errors.
  • 24×7 Customer Support: With Hevo Data, you get more than just a platform, you get a partner for your pipelines. Discover peace with round-the-clock “Live Chat” within the platform. Moreover, you get 24×7 support even during the 14-day full-feature free trial.
  • Transparent Pricing: Say goodbye to complex and hidden pricing models. Hevo Data’s Transparent Pricing brings complete visibility to your ELT spending. Choose a plan based on your business needs. Stay in control with spend alerts and configurable credit limits for unforeseen spikes in the data flow. 
Get started for Free with Hevo Data!

What Can You Achieve by Replicating Your Data from Marketo to Databricks?

Replicating data from Marketo to Databricks can help your data analysts get critical business insights. Does your use case make the list?

  • Customers acquired from which channel have the maximum satisfaction ratings?
  • How does customer SCR (Sales Close Ratio) vary by Marketing campaign?
  • How many orders were completed from a particular Geography?
  • How likely is the lead to purchase a product?
  • What is the Marketing Behavioural profile of the Product’s Top Users?

Summing It Up

Collecting an API key, sending a GET request through REST APIs, downloading, transforming uploading the JSON data would be the smoothest process when your marketing team requires data from Marketo only once in a while. But what if the marketing team requests data of multiple objects with numerous filters in the Marketo data every once in a while? Going through this process over and again can be monotonous and would eat up a major portion of your engineering bandwidth. The situation worsens when these requests are for replicating data from multiple sources.

So, would you carry on with this method of manually writing GET API requests every time you get a request from the Marketing team? You can stop spending so much time being a ‘Big Data Plumber’ by using a custom ETL solution instead.

A custom ETL solution becomes necessary for real-time data demands such as monitoring email campaign performance or viewing the sales funnel. You can free your engineering bandwidth from these repetitive & resource-intensive tasks by selecting Hevo Data’s 150+ plug-and-play integrations (including 40+ free sources such as Marketo).

Visit our Website to Explore Hevo Data

Saving countless hours of manual data cleaning & standardizing, Hevo Data’s pre-load data transformations get it done in minutes via a simple drag n-drop interface or your custom python scripts. No need to go to your data warehouse for post-load transformations. You can run complex SQL transformations from the comfort of Hevo Data’s interface and get your data in the final analysis-ready form. 

Want to take Hevo Data for a spin? Sign Up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.

Share your experience of connecting Marketo to Databricks! Let us know in the comments section below!

Manisha Jena
Research Analyst, Hevo Data

Manisha Jena is a data analyst with over three years of experience in the data industry and is well-versed with advanced data tools such as Snowflake, Looker Studio, and Google BigQuery. She is an alumna of NIT Rourkela and excels in extracting critical insights from complex databases and enhancing data visualization through comprehensive dashboards. Manisha has authored over a hundred articles on diverse topics related to data engineering, and loves breaking down complex topics to help data practitioners solve their doubts related to data engineering.

No-code Data Pipeline for Databricks