Data preparation is often the most challenging, costly, and time-consuming part of an analytics project. Datasets can be fragmented, incomplete, or lack structural consistency, making it difficult to work with. Power BI Dataflows can help overcome these challenges by allowing you to ingest, transform, clean, and integrate large volumes of data into a standardized form using its Calculation Engine and Power Query. This simplifies the creation of a self-service ETL Power BI pipeline.

In this article, I’ll guide you through setting up an ETL process in Power BI using Dataflows. You’ll learn the tools and techniques needed to build your own customized ETL pipeline, helping you streamline data handling for your organization and sharpen your skills along the way.

What is Power BI?

Power BI ETL: Power BI Logo.

Power BI is an intuitive and one of the most popular Business Intelligence tools by Microsoft that allows users to aggregate, analyze, visualize, and create & share compelling business reports across their teams. 

It provides robust integration support with a variety of sources such as Excel, cloud-based or on-premises data warehouses, and many more, allowing teams to convert their data into immersive and actionable insights.

Power BI is available across three different platforms, namely Power BI Desktop for Windows, Power BI Service, a SaaS application, and Power BI Mobile, an iOS and Android app.

What is Data Flow?

Power BI ETL: Dataflows in Power BI.

Power BI Dataflows allow you to store data in entities, which function like tables. Following a data warehousing approach, Dataflows let you bring in data from multiple sources—like Salesforce, SQL Server, and Dynamics 365—and map it to a single schema using Microsoft’s Common Data Model. The data is then stored in the fully-managed Azure Data Lake Gen 2 service, ready for analysis.

With Dataflows, you can set up a self-service ETL pipeline, enabling teams to ingest data and transform it into an analysis-ready format. Using Power BI to perform ETL, along with Power Query and the robust Calculation Engine, you can easily create reports and dashboards. This intuitive approach simplifies data preparation, reducing the need for extensive technical knowledge, resources, and costs, especially when dealing with large data volumes.

Simplify your data analysis with Hevo’s No-code Data Pipelines

Hevo Data, a No-code Data Pipeline, helps to transfer data from 150+ sources to your desired data warehouses. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that data is handled in a secure, consistent manner with zero data loss.

Check out what makes Hevo amazing:

  • Secure & Reliable: Hevo’s fault-tolerant architecture ensures secure, consistent data handling with zero loss and automatic schema management.
  • User-Friendly & Scalable: Hevo’s simple UI makes it easy for new users, while its horizontal scaling manages growing data volumes with minimal latency.
  • Efficient Data Transfer: Hevo supports real-time, incremental data loads, optimizing bandwidth usage for both ends.
  • Live Monitoring & Support: Hevo provides live data flow monitoring and 24/5 customer support via chat, email, and calls.
Sign up here for a 14-Day Free Trial!

How Dataflows Make Power BI ETL More Powerful

Power BI ETL: Power BI Dataflows.

Some of the key features & capabilities of Dataflows that make ETL powerful:

  • Robust AI & Analytics with Azure: With Data flows in place, teams can directly ingest data from a diverse set of sources into the Azure Data Lake Gen 2, allowing analysts to make use of various services such as Azure Databricks, advanced analytics, machine learning, SQL, etc. Dataflows make this functionality available even to Business Analysts, allowing them to carry out advanced data analysis and easily collaborate with data scientists.
  • Common Data Model (CDM): Dataflows ensure data consistency with the help of Microsoft’s Common Data Model, which is a collection of standard data schemas. Dataflows automatically map different types of data from a diverse set of sources into a “common schema” and make it available across teams within an organization, allowing them to collaborate with ease. The Common Data Model also lets users customize the standard CDM entities to meet their specific business use cases.
  • Self-Service Big Data Preparation: With Data flows in place, users can not only bring in data from various sources but also clean and integrate data of different types and schemas into a universal Common Data Model schema. Powerful functionalities such as Power Query, Calculation-engine, etc. simplify and make ETL, a convenient, quick, and less expensive process. With a familiar environment, tools such as Power Query and powerful engines, Dataflows make the tasks of Analysts, Engineers, etc. ultra-simple.

Prerequisites

  • Working knowledge of Power BI.
  • Working knowledge of Azure and its services.
  • An Azure account.
  • A Power BI account.

Methods to Set up the Power BI ETL with Dataflows

There are multiple ways in which you can create Dataflows to set up Power BI ETL:

Download the Whitepaper on Automating Data Integration
Download the Whitepaper on Automating Data Integration
Download the Whitepaper on Automating Data Integration
Learn the key benefits of automating Data Integration

Method 1: Creating Dataflows using New Entities to set up Power BI ETL

This can be implemented using the following steps:

Step 1: Signing in to your Power BI Service Account

To begin setting up Dataflows, go to the official Power BI Service site and log in using your credentials such as your Microsoft username and password. 

Power BI ETL: Power BI Login Page.

Once you’ve logged in, the Power BI dashboard will now open up on your screen as follows:

Power BI ETL: Power BI Dashboard.

Click on the workspace option, found in the panel on the left, and open your desired workspace. Click on the Create option and select Dataflow.

Power BI ETL: Creating Dataflows in Power BI.

Once you’ve clicked on the Dataflow option, the following screen will now open up, where you can choose between four different ways to set up Power BI ETL using Dataflows.

Power BI ETL Dataflow Methods.

Click on the add new entities option to create a new table or entity for your Power BI Dataflows.

Step 2: Configuring your Data Source for Dataflows

Once you’ve selected the type of Dataflow you want to create, a list of supported data sources will now open up on your screen as follows. Choose your data source from a large set of available connectors:

Power BI ETL: List of Supported Data Sources.

Once you’ve selected your desired data source, you now need to configure your connection by providing your connection details such as the server & database details, along with your credentials such as username and password. This is required to allow Azure to access your data with ease.

Power BI ETL: Connecting Azure with the data source.

Once you’ve logged in successfully, you can access your data directly from the Azure Data Lake with the help of Power Query. 

Power BI ETL: Choosing data from Azure Data Lake.

With Dataflows in place, Power BI will automatically refresh your connection and ensure that you always have up-to-date data in your Azure Data Lake.

Step 3: Setting up Incremental Refresh for Dataflows

Power BI Dataflows’ Incremental Refresh functionality automatically updates the data stored in your Azure Data Lake Gen 2 periodically and replicates any new update made to your data source.

To set this up, click on the Dataflow option and then select the Incremental Refresh option found at the right end of your screen for every entity you’ve created.

Power BI ETL: Incremental Refresh Option.

The Incremental Refresh settings dialogue box will now open up on your screen as follows:

Power BI ETL: Incremental Refresh Settings Dialogue Box.

Provide the following information carefully:

  • On/off Toggle: Toggle the switch to the on position to start the Incremental Refresh process.  
  • DateTime: Select the DateTime field of your entity based on which the data refresh will take place.
  • Store Rows: Enter the period, starting from where you want to bring in data. For long periods, the first data refresh may take some time to occur.
  • Refresh Rows: Enter the period, starting from where you want Dataflows to refresh your data periodically.
  • Detect Data Changes: Checking this box causes Dataflows to check the days when the data updations took place at the source and refresh the data for that period only if required/ in case of any new updations.

This is how you can use New Entities to set up Power BI ETL using Dataflows.

Limitations of using New Entities for setting up Dataflows

  • Using Dataflows requires mapping data into a standard Common Data Model format, making it challenging for companies that have unique mappings. 
  • There are row-level security issues at times.
  • Incremental Refreshes don’t work with Dataflows that have Common Data Service as their desired destination. It further doesn’t work with entities that refer to Dataflows.

Method 2: Creating Dataflows using Linked Entities to set up Power BI ETL

Dataflows’ Linked Entity functionality allows users to refer to a previously created entity that might exist in any other Dataflow, thereby allowing users to reuse the data which already exists in the Azure Data Lake and hence build upon each other’s work. With Linked Entities in place, users can customize their ETL pipelines, by first creating multiple Dataflows, linking them together, and then using them to build their ETL pipeline.

The Linked Entities functionality is only available to users that have a Power BI Premium subscription.

Setting up Dataflows by adding Linked Entities

To set up Dataflows using a Linked Entity, select the workspace option, from the panel on the left. Click on the create option found at the top right corner of your screen and click on Dataflow.

Power BI ETL: Creating a Dataflow.

The Dataflow settings page will now open on your screen, where you need to click on the add Linked Entities option as follows:

Power BI ETL: Adding Entities to the Dataflow.

You can now bring in data from various existing entities such as Sales and Services and link them together to create a new Dataflow. These can then be analyzed using Power BI to create informative reports and dashboards and draw crucial insights. Dataflows allow users to independently refresh each of the linked Dataflows’ to ensure that they always have up-to-date data for analysis.

Power BI ETL: Dataflows with Linked Entities.

This is how you can use Linked Entities to set up Power BI ETL using Dataflows.

Limitation of using Linked Entities for setting up Dataflows

  • Only users that have a Premium Power BI subscription can make use of the Linked Entities functionality.

Method 3: Creating Dataflows using Computed Entities to set up Power BI ETL

Using Computed Entities allows users to work with existing Linked Entities that are already present in the Azure Data Lake Gen 2. It lets you update or transform the already ingested data, making use of the Dataflows’ powerful Recalculation Engine, which takes care of all data & workspace dependencies. When you use a Computed Entity, it creates a new entity for the Dataflow, by performing a transformation on the Linked Entities in a read-only manner.

You can create a Computed Entity, by transforming a linked entity, either by using a merge operation or by creating a duplicate of the entity. Creating a duplicate entity allows you to perform multiple transformations and updations on the data. The Computed Entities functionality is only available to users that have a Power BI Premium subscription.

This can be implemented using the following steps:

Step 1: Signing in to your Power BI Service Account

Go to the official Power BI Service website and log in with your credentials such as your username and password.

Power BI ETL: Power BI Login Page.

Step 2: Transforming the Entity to a Computed Entity

Once you’ve logged in, go to the Dataflow authoring tool and click on the edit entities option. A list of all your Dataflow entities will now open up on your screen, select the entity that you want to use as the foundation of your Computed Entity.

Right-click on the desired entity, and a new will dialogue box, known as the context menu will now open up on your screen.

Ensure that the enable load option has a tick, as this is a must for the entity to be eligible. Click on the reference option. Once you’ve clicked on it, a new entity with a power sign will now appear, indicating that it is now a Computed Entity.

Power BI ETL: Computed Entity Power Sign.

Any operation that you perform, such as summary, distinct counts, or transformation, will now happen on the data stored in your Azure Data Lake, rather than on the data source of your entity.

For example, if you want to combine raw data from Account and Service Call entities to draw crucial insights, you can do this using Computed Entities with ease.

To do this, you will have to aggregate the data on an attribute such as the account_id, and then create a new attribute to store the newly transformed data.

Power BI ETL: Grouping Entities by accountid.

Once you’ve performed the desired operation on one of the entities, you can use the merge operation to combine data from the Account entity and “ServiceCallsAggregated” entity, using a SQL-based join, such as Left Join as follows:

Power BI ETL: Merge Operation using Joins.

Once you’ve selected the type of join, click on ok. You can now find the modified data in the newly created EnrichedAccount Computed Entity as follows:

Power BI ETL: Computed Entity.

This is how you can use Computed Entities to perform transformations on the data already residing in the Azure Data Lake and set up Power BI ETL using Dataflows. 

Limitations of using Computed Entities for Setting up Dataflows

  • Only users that have a Premium Power BI subscription can make use of the Computed Entities functionality.
  • Computed Entities work harmoniously with Linked Entities, only when both of them are present in the Azure Data Lake, associated with the same account.

Method 4: Using a CDM Folder in Dataflows to set up Power BI ETL

Using CDM folders to create Dataflows allows users to refer to an entity with attributes that follow the Common Data Model format. These entities are created by some other application and are already present in Azure Data Lake.

To create a Dataflow using CDM folders, you must meet the following requirements:

  • The user trying to create the Dataflow must have access to the account associated with Azure Data Lake.
  • Power BI must have permission to access the file from Azure Data Lake.
  • CDM Folder based Dataflows can be created only in new workspaces.
  • The file path of the JSON files must have an Azure Data Lake Gen 2 endpoint.

Once you have all the necessary permissions, you can create the Dataflow by providing the complete URL or folder path for the CDM file stored in your Azure Data Lake, along with a name for your Dataflow.

This is how you can use CDM Folders to set up Power BI ETL using Dataflows.

Limitations of using CDM Folders for setting up Dataflows

  • Using CDM Files requires users to have access to the account associated with Azure Data Lake Gen 2. This can be challenging, as many users trying to create a Dataflow, may or may not have access.
  • Dataflows don’t support using “blob.core” endpoints while specifying the URL for the CDM File.

Conclusion

This article introduces you to the various methods that can be used to set up Power BI ETL with Dataflows. It also provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently. These methods, however, can be challenging, especially for a beginner & this is where Hevo saves the day.

Hevo Data, a No-Code Data Pipeline, helps you transfer data from a source of your choice in a fully-automated and secure manner without having to write the code repeatedly. Hevo with its strong integration with 150+ sources, allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiffy.

Want to take Hevo for a spin?

Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand.

Tell us about your preferred way of using Dataflows to set up Power BI ETL! Let us know your thoughts in the comments section below.

FAQs

1. Is Power BI an ETL tool?

Power BI is primarily a data visualization and business intelligence tool, not an ETL (Extract, Transform, Load) tool. However, it has some ETL capabilities through Power Query, which allows users to extract, transform, and load data into Power BI for reporting and analysis.

2. What is the ETL process in BI?

The ETL process in Business Intelligence (BI) involves three main steps: Extracting data from various sources, Transforming it into a usable format by cleaning and processing, and Loading the transformed data into a data warehouse or BI tool for analysis and reporting. This process helps organizations consolidate data for better decision-making.

3. Is Power Query ETL or ELT?

Power Query is primarily an ETL tool, as it extracts data from various sources, transforms it (such as cleaning or reshaping), and then loads it into Power BI or Excel for analysis. While it can handle some ELT tasks, its main strength lies in transforming data before loading it into the final destination.

4. Does Microsoft have an ETL tool?

Yes, Microsoft offers several ETL tools, with SQL Server Integration Services (SSIS) being the most well-known. SSIS is a powerful ETL tool used for data integration, transformation, and loading, and it’s part of the SQL Server suite. Additionally, Power BI and Azure Data Factory also provide ETL capabilities.

Divij Chawla
Marketing Operations and Analytics Manager, Hevo Data

Divij Chawla is interested in data analysis, software architecture, and technical content creation. With extensive experience driving Marketing Operations and Analytics teams, he excels at defining strategic objectives, delivering real-time insights, and setting up efficient processes. His technical expertise includes developing dashboards, implementing CRM systems, and optimising sales and marketing workflows. Divij combines his analytical skills with a deep understanding of data-driven solutions to create impactful content and drive business growth.