Hevo Makes Data Pipelines a Cakewalk for Data Engineers

on Product • November 12th, 2017 • Write for Hevo

Introducing Pipelines

In most businesses today, Data exists in multiple silos. To fully leverage the raw data and get real insights, organizations need to coalesce these systems into one, and then implement analytics infrastructure on top of it. Here is an example:

Consider, that one day, your CEO asks you to report the ROI (return on investment) per customer for the budget spent on paid ads? To be able to answer this question, you will have to merge data from marketing systems like Facebook Ads, Google Adwords, etc with the data in your traditional transactional systems like MySQL or MongoDB. When you have gazillions of rows of data, scattered across multiple sources, manually bringing data together goes beyond the analyst’s role. Engineering bandwidth will have to be allocated to extract, clean, enrich and move the data to central storage, say a data warehouse that can enable you to analyze massive amounts of data.

However, writing complex scripts to automate all of this is not easy. It gets harder if you want to stream your data real-time. Data loss becomes an everyday phenomenon due to issues that crop up with changing sources, unstructured & unclean data, incorrect data mapping at the warehouse and more. These are only a few of the many issues that will arise.

In the light of increasing rich data sources and scaling businesses, the traditional approaches to data integration and ETL do not make the cut for progressive companies.

We built Hevo Data Integration Platform to help Data Teams within the companies bring data from 100s of sources into the Warehouse in Minutes – Without Writing any Code

With Hevo you can easily move data from any source into the data warehouse while transforming, cleaning and enriching the data on the fly. Hevo ensures that your data is streamed with zero data loss and in real-time.

The Working:

Hevo’s user-friendly Pipelines provide a simple interface that enables you to instantly connect to a variety of sources from SQL, NoSQL, SaaS, File Storage Base, Webhooks, etc. and move the data to any destination. With an eye-grabbing visual interface, any user can set up a data pipeline in minutes and load data to a data warehouse or application. Additionally, the flexible data preparation capabilities allow users to enrich data on the go before loading it into the destination. Some of its powerful features are:

Bring Data from an Array of Sources

With Hevo, integrate a variety of data sources with the click of a button and bring data to your destination. Be it SaaS, marketing systems or traditional databases, we have covered it all. Here is the complete list.

Data Sources on Hevo

Easily connect a variety of Data Sources

On the destinations front, with Hevo you can stream data into Amazon Redshift, Google BigQuery, MySQL, PostgreSQL at zero data loss.

Transformations/Data Enrichment 

Enriching data is a cakewalk on Hevo. You can load a sample event from your source, write quick python transformations to clean, aggregate, enrich your data. You can even split an incoming event into multiple arbitrary events making it easy for you to normalize Nested NoSQL data.

A preview allows you to test the result before you deploy.

Transformations on Hevo

Build, Preview and Deploy Transformations

Schema Mapper

Schema Mapper allows you to define and store the data in your destination warehouse just the way you want. You can let Schema Mapper automatically map source fields and data types onto the destination or do it manually. Schema mapper provides granular control on your data making it easy to control every behavioral aspect.  With Hevo’s advanced Automatic schema detection algorithms you can easily detect any schema modifications at the source or destination and handle it instantly from within the tool.

Schema Mapper on HevoAuto-suggest on Schema on Mapper

Replay Queue

What happens to the flowing data when there is a buggy transformation or a data type mismatch or when the destination is unreachable? It goes into the Replay Queue. Replay queue houses all events that failed to load into your destination due to any unforeseen errors. This gives you an opportunity to fix any situation without having to worry about data loss. A cherry on top, replay queue checks for error fixtures every few minutes and automatically moves data to the destination.

Replay Queue on Hevo

Zero Data Loss with Replay Queue

Activity Log

Hevo’s Activity Log provides a one-stop view to watch all the activity that occurs within pipelines. From user activities to failure at any stage in your pipeline to successful executions, all of it is recorded here. 

Activity Log on Hevo

Monitor your pipelines with Activity Log

Pipeline Overview

Overview feature gives you a panoramic view of your pipeline. It informs you of activities that are happening at each stage of the data flow. It also informs you about the jobs that Hevo is running to read your data and allows you take various actions on it.

Overview of Pipelines on Hevo

Pipeline Overview

Slack Alerts 

All the details related to your pipelines are notified to you over email and Slack. Be it the status of the pipeline, the replay queue or schema change detected, every information is available at your fingertips through Slack. This makes it easy for you to monitor and take quick actions when necessary.

Slacks Alerts on Hevo

Instant Slack Notifications

Models and Workflows

To top this all, Hevo’s Modelling and Workflows features allow you aggregate and join the data to store results as materialized views on your warehouse. These views assure you faster query response times making any report pulls possible in a matter of a few seconds. Moreover, you can build Workflows i.e Direct Acyclic Graphs (DAG) on a drag-and-drop interface to define dependencies between different data models. Here is a detailed read on Models and Workflows on Hevo

Data Models on Hevo

Build Data Models easily

Workflows on Hevo

Define Workflows on a Drag-and-Drop Interface

We’ve built a zero-loss, reliable, fault-tolerant data pipeline to ensure a powerful business analytics to you. What are your thoughts about pipelines, let us know in the comments.

No-code Data Pipeline for your Data Warehouse