Unlock the full potential of your Clevertap data by integrating it seamlessly with Databricks. With Hevo’s automated pipeline, get data flowing effortlessly—watch our 1-minute demo below to see it in action!

Imagine having all your customer engagement data in one place, ready to go to get analyzed into actionable insights. That’s what you could do by connecting CleverTap with Databricks. Whether it’s deep-diving into customer behavior, personalization of marketing effort, or simply getting a better picture of what works and what doesn’t, this integration makes it so much easier. Big data analytics will, in turn, enable you to have more powerful strategies, with wiser and faster-reaching decisions. Besides this, you can handle even the largest datasets since Databricks has a scalable platform.

Well, look no further. With this article, get a step-by-step guide to connecting Clevertap to Databricks effectively and quickly, delivering data to your marketing team. 

Seamlessly Integrate CleverTap with Databricks Using Hevo

Take the hassle out of data integration with Hevo’s no-code platform. Effortlessly connect CleverTap to Databricks and streamline your data flow for deeper insights and analytics.

Why Choose Hevo?

  • No-Code Simplicity: Set up data pipelines without writing any code, making integration smooth and easy.
  • Real-Time Syncing: Keep your data fresh with continuous syncing between CleverTap and Databricks.
  • Effortless Scalability: Scale your data operations without losing efficiency or compromising performance.

Unlock the true potential of your CleverTap data with Hevo and take your analytics to the next level.

Get Started with Hevo for Free

Replicate Data from Clevertap to Databricks Using Clevertap APIs

To start replicating data from Clevertap to Databricks, you need to use one of the Clevertap APIs, for example, we will be using  Event Export API here:

  • Step 1: Data in Clevertap can be exported as JSON data with the help of API keys. Clevertap provides the Event API to retrieve the data. The following command is provided by Clevertap for you. You will need the following endpoint: 
$ python -m pip install requests

import requests

headers = {
    'X-CleverTap-Account-Id': 'ACCOUNT_ID',
    'X-CleverTap-Passcode': 'PASSCODE',
    'Content-Type': 'application/json',
}

response = requests.get('https://in1.api.clevertap.com/1/events.json?cursor=CURSOR', headers=headers))
  • $ python -m pip install requests: This command installs the requests Python library, used for making HTTP requests.
  • import requests: Imports the requests module to use for making web requests.
  • headers: This dictionary stores the request headers. It includes:
    • X-CleverTap-Account-Id: The account ID for CleverTap.
    • X-CleverTap-Passcode: The passcode for authentication.
    • Content-Type: Specifies the format of data being sent, here it’s JSON.
  • requests.get(...): Sends an HTTP GET request to the CleverTap API to retrieve event data.
  • The URL includes a query parameter cursor, which is likely used for paginating through events.
  • headers=headers: Sends the defined headers with the request.
  • response: Stores the server’s response to the GET request.

You need to modify a bit to store your response as JSON file: 


import requests

headers = {
    'X-CleverTap-Account-Id': 'ACCOUNT_ID',
    'X-CleverTap-Passcode': 'PASSCODE',
    'Content-Type': 'application/json',
}

response = requests.get('https://in1.api.clevertap.com/1/events.json?cursor=CURSOR', headers=headers)

import json
with open('personal.json', 'w') as json_file:
    json.dump(json_data, json_file)
  • Sends a GET request to the CleverTap API to get event data.
  • Uses headers for authentication.
  • Writes the response data to a personal.json file (though json_data needs to be defined).

JSON Response:

{
    "status": "success",
    "next_cursor": "ZyZjfwYEAgdjYmZyKz8NegYFAwxmamF%2FZ21meU4BBQFlYmN7ZG5ifAYCTQQrai57K2ouegJMABl6",
    "records": [
        {
            "profile": {
                "objectId": "a8ffcbc9-a747-4ee3-a791-c5e58ad03097",
                "platform": "Web",
                "email": "peter@foo.com",
                "profileData": {
                    "favoriteColor": "blue"
                },
                "identity": "5555555555",
                "id": 33
            },
            "ts": 20151023140416,   // yyyyMMddHHmmSS format
            "event_props": {
                "value": "pizza"
            },
            "session_props": {
                "utm_source": "Foo",
                "utm_medium": "Bar",
                "utm_campaign": "FooBar",
                "session_source": "Direct"
            }
        },
        {
            "profile": {
                "objectId": "a8ffcbc9-a747-4ee3-a791-c5e58ad03097",
                "platform": "Web",
                "email": "peter@foo.com",
                "profileData": {
                    "favoriteColor": "blue"
                },
                "identity": "5555555555",
                "id": 33
            },
            "ts": 20151024121636,
            "event_props": {
                "value": "pizza"
            },
            "session_props": {
                "session_source": "Others",
                "session_referrer": "http://example.com"
            }
        }
    ]
}
  • Step 2: You can read the JSON files in single-line or multi-line format in Databricks. In a single-line manner, a file may be split into multiple parts and expressed in parallel. In multi-line style, a file is capsulated in one body and cannot be split.

To read the JSON file in a single-line format, you can use the following command in Scala: 

val df = spark.read.format("json").load("your-file-name.json")
  • val df: Creates a DataFrame called df.
  • spark.read.format("json"): Specifies that the data is in JSON format.
  • load("your-file-name.json"): Loads the JSON file into the DataFrame df.

To read the JSON file in a multi-line format, you can use the following command in Scala: 

val mdf = spark.read.option("multiline", "true").format("json").load("/tmp/your-file-name.json")
  • val mdf: Creates a DataFrame called mdf.
  • spark.read.option("multiline", "true"): Tells Spark to read multi-line JSON files correctly.
  • format("json"): Specifies that the file format is JSON.
  • load("/tmp/your-file-name.json"): Loads the JSON file from the specified path into the DataFrame mdf.

Although the charset is detected automatically in Databricks, but you can also provide it using the Charset option:

spark.read.option("charset","UTF-16BE").format("json").load("your-file-name.json")
  • spark.read.option("charset", "UTF-16BE"): Specifies the character encoding as UTF-16BE for reading the file.
  • format("json"): Indicates the file format is JSON.
  • load("your-file-name.json"): Loads the JSON file into a Spark DataFrame.

Using Clevertap APIs to replicate data from Clevertap to Databricks is optimal for the following scenarios

  • APIs can be programmed as customized scripts that can be deployed with detailed instructions on completing each workflow stage.

In the following scenarios, using the Clevertap APIs might be cumbersome and not a wise choice:

  • Updating the existing API calls and managing workflows requires immense engineering bandwidth and hence can be a pain point for many users. Maintaining APIs is costly in terms of development, support, and updating.

When the frequency of replicating data from Clevertap increases, this process becomes highly monotonous. It adds to your misery when you have to transform the raw data every single time. With the increase in data sources, you would have to spend a significant portion of your engineering bandwidth creating new data connectors. Just imagine — building custom connectors for each source, transforming & processing the data, tracking the data flow individually, and fixing issues. Doesn’t it sound exhausting?

Integrate CleverTap to Databricks
Integrate CleverTap to BigQuery
Integrate CleverTap to Redshift

How about you focus on more productive tasks than repeatedly writing custom ETL scripts? This sounds good, right?

In these cases, you can… 

Replicate data from Clevertap to Databricks Using an Automated ETL Tool like Hevo

Here, are the benefits of leveraging a no-code tool:

  • Automated pipelines allow you to focus on core engineering objectives while your business teams can directly work on reporting without any delays or data dependency on you.
  • Automated pipelines provide a beginner-friendly UI. Tasks like configuring and establishing connection with source and destination, providing credentials and authorization details, performing schema mapping etc. are a lot simpler with this UI. It saves the engineering teams’ bandwidth from tedious preparation tasks.

For instance, here’s how Hevo, a cloud-based ETL tool, makes Clevertap to Databricks data replication ridiculously easy:

Step 1: Configure Clevertap as a Source

Authenticate and Configure your Clevertap Source.

Configure CleverTap as Source

Step 2: Configure Databricks as a Destination

In the next step, we will configure Databricks as the destination.

Connect Databricks as Destination

Once your Clevertap to Databricks ETL Pipeline is configured, Hevo will collect new and updated data from Clevertap every five minutes (the default pipeline frequency) and duplicate it into Databricks. Depending on your needs, you can adjust the pipeline frequency from 5 minutes to an hour.

Data Replication Frequency

Default Pipeline FrequencyMinimum Pipeline FrequencyMaximum Pipeline FrequencyCustom Frequency Range (Hrs)
1 Hr15 Mins24 Hrs1-24

In a matter of minutes, you can complete this No-Code & automated approach of connecting Clevertap to Databricks using Hevo and start analyzing your data.

Hevo offers 150+ plug-and-play connectors(Including 40+ free sources). It efficiently replicates your data from Clevertap, databases, data warehouses, or a destination of your choice like Databricks in a hassle-free & automated manner. Hevo’s fault-tolerant architecture ensures that the data is handled securely and consistently with zero data loss. It also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Hevo’s reliable data pipeline platform enables you to set up zero-code and zero-maintenance data pipelines that just work. Here’s what allows Hevo to stand out in the marketplace:

  • Fully Managed: You don’t need to dedicate time to building your pipelines. With Hevo’s dashboard, you can monitor all the processes in your pipeline, thus giving you complete control over it.
  • Data Transformation: Hevo provides a simple interface to cleanse, modify, and transform your data through drag-and-drop features and Python scripts. It can accommodate multiple use cases with its pre-load and post-load transformation capabilities.
  • Faster Insight Generation: Hevo offers near real-time data replication, so you have access to real-time insight generation and faster decision-making. 
  • Schema Management: With Hevo’s auto schema mapping feature, all your mappings will be automatically detected and managed in the destination schema.
  • Scalable Infrastructure: With the increase in the number of sources and volume of data, Hevo can automatically scale horizontally, handling millions of records per minute with minimal latency.
  • Transparent pricing: You can select your pricing plan based on your requirements. Different plans are clearly put together on its website, along with all the features it supports. You can adjust your credit limits and spend notifications for any increased data flow.
  • Live Support: The support team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.

What Can You Achieve by Migrating Your Data from Clevertap to Databricks?

Here’s a little something for the data analyst on your team. We’ve mentioned a few core insights you could get by replicating data from Clevertap to Databricks. Does your use case make the list?

  • What is the Marketing Behavioural profile of the Product’s Top Users?
  • What message would take a customer from one lifecycle stage to another?
  • How likely is the lead to purchase a product?
  • What was the survey’s Engagement Rate of App Users by channel?

Learn More About:

Summing It Up

Clevertap APIs are the right path for you when your team needs data from Clevertap once in a while. However, an ETL solution becomes necessary if there are rapid changes in the source and frequent data replication needs to be done to meet data demands of your product or marketing channel. You can free your engineering bandwidth from these repetitive & resource-intensive tasks by selecting Hevo’s 150+ plug-and-play integrations.

Saving countless hours of manual data cleaning & standardizing, Hevo’s pre-load data transformations get it done in minutes via a simple drag n-drop interface or your custom python scripts. No need to go to your data warehouse for post-load transformations. You can simply run complex SQL transformations from the comfort of Hevo’s interface and get your data in the final analysis-ready form. 

Want to take Hevo for a ride? Try Hevo’s 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.

Share your experience of replicating data from Clevertap to Databricks! Let us know in the comments section below!

FAQs

1. How do I transfer data to Databricks?

You can use tools like Hevo to automatically transfer data to Databricks in real-time, simplifying the process without any coding.

2. What tools can connect to Databricks?

Tools like Hevo, Apache Spark, SQL, and BI tools like Tableau can seamlessly connect to Databricks for data integration.

3. What can you do with CleverTap?

CleverTap helps businesses track user behavior and send personalized messages, and with Hevo, you can connect CleverTap data to Databricks for deeper analysis.

Harsh Varshney
Research Analyst, Hevo Data

Harsh is a data enthusiast with over 2.5 years of experience in research analysis and software development. He is passionate about translating complex technical concepts into clear and engaging content. His expertise in data integration and infrastructure shines through his 100+ published articles, helping data practitioners solve challenges related to data engineering.