Data preparation is generally the most difficult, expensive, and time-consuming task in a typical analytics project. Data sets may include fragmented and incomplete data, data with the absence of any structural consistency, etc.
Power BI Dataflows help you curb all these challenges and lets you ingest, transform, clean, and integrate large volumes of data and map them into a standardized form, using its Calculation Engine, Power Query, etc. Dataflows help simplify and set up a self-service ETL Power BI Pipeline.
This article aims at providing you with a step-by-step guide to help you set up an ETL process in Power BI with Dataflows using various methods.
Through this article, you will get a deep understanding of the tools and techniques being used, which will help you hone your skills further. It will help you build your customized ETL Pipeline for your organization.
Table of Contents
What is Power BI?
Power BI is an intuitive and one of the most popular Business Intelligence tools by Microsoft that allows users to aggregate, analyze, visualize, and create & share compelling business reports across their teams.
It provides robust integration support with a variety of sources such as Excel, cloud-based or on-premises data warehouses, and many more, allowing teams to convert their data into immersive and actionable insights.
Power BI is available across three different platforms, namely Power BI Desktop for Windows, Power BI Service, a SaaS application, and Power BI Mobile, an iOS and Android app.
For further information on Power BI, you can check the official website here.
What is Data Flow?
Power BI Dataflows allows users to store data in the form of entities or a collection of entities that work in a way similar to tables.
Dataflows follow a data warehousing approach and allow users to bring in data from a variety of sources, mapping all of it to a single schema designed by Microsoft using its Common Data Model. It then stores the data in its fully-managed Azure Data Lake Gen 2 service.
Dataflows allow setting up a complete self-service ETL, that lets teams across an organization not only ingest data from a variety of sources such as Salesforce, SQL Server, Dynamics 365, etc. but also convert it into an analysis-ready form.
With Dataflows in place, you can directly use Power BI to perform ETL, and create impeccable reports and dashboards, with the help of Power Query, Robust Calculation Engine, and Power BI Services.
Dataflows with their intuitive and robust functionalities simplify ETL and help bring down the cost, technical knowledge, and resources required to tackle data preparation and ETL challenges, especially when working with huge volumes of data.
For further information on Dataflows, you can check the official documentation here.
How Dataflows Make Power BI ETL More Powerful
Some of the key features & capabilities of Dataflows that make ETL powerful:
- Robust AI & Analytics with Azure: With Data flows in place, teams can directly ingest data from a diverse set of sources into the Azure Data Lake Gen 2, allowing analysts to make use of various services such as Azure Databricks, advanced analytics, machine learning, SQL, etc. Dataflows make this functionality available even to Business Analysts, allowing them to carry out advanced data analysis and easily collaborate with data scientists.
- Common Data Model (CDM): Dataflows ensure data consistency with the help of Microsoft’s Common Data Model, which is a collection of standard data schemas. Dataflows automatically map different types of data from a diverse set of sources into a “common schema” and make it available across teams within an organization, allowing them to collaborate with ease. The Common Data Model also lets users customize the standard CDM entities to meet their specific business use cases.
- Self-Service Big Data Preparation: With Data flows in place, users can not only bring in data from various sources but also clean and integrate data of different types and schemas into a universal Common Data Model schema. Powerful functionalities such as Power Query, Calculation-engine, etc. simplify and make ETL, a convenient, quick, and less expensive process. With a familiar environment, tools such as Power Query and powerful engines, Dataflows make the tasks of Analysts, Engineers, etc. ultra-simple.
Hevo Data, a No-code Data Pipeline, helps to transfer data from 150+ sources to your desired data warehouses. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that data is handled in a secure, consistent manner with zero data loss.
Get Started with Hevo for Free
It provides a consistent & reliable solution to manage data in real-time and always has analysis-ready data in your desired destination. It allows you to focus on key business needs and perform insightful analysis using various BI tools such as Power BI.
Check out what makes Hevo amazing:
Sign up here for a 14-Day Free Trial!
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
- Working knowledge of Power BI.
- Working knowledge of Azure and its services.
- An Azure account.
- A Power BI account.
Methods to Set up the Power BI ETL with Dataflows
There are multiple ways in which you can create Dataflows to set up Power BI ETL:
Download the Whitepaper on Automating Data Integration
Learn the key benefits of automating Data Integration
Method 1: Creating Dataflows using New Entities to set up Power BI ETL
This can be implemented using the following steps:
Step 1: Signing in to your Power BI Service Account
To begin setting up Dataflows, go to the official Power BI Service site and log in using your credentials such as your Microsoft username and password.
Once you’ve logged in, the Power BI dashboard will now open up on your screen as follows:
Click on the workspace option, found in the panel on the left, and open your desired workspace. Click on the Create option and select Dataflow.
Once you’ve clicked on the Dataflow option, the following screen will now open up, where you can choose between four different ways to set up Power BI ETL using Dataflows.
Click on the add new entities option to create a new table or entity for your Power BI Dataflows.
Step 2: Configuring your Data Source for Dataflows
Once you’ve selected the type of Dataflow you want to create, a list of supported data sources will now open up on your screen as follows. Choose your data source from a large set of available connectors:
Once you’ve selected your desired data source, you now need to configure your connection by providing your connection details such as the server & database details, along with your credentials such as username and password. This is required to allow Azure to access your data with ease.
Once you’ve logged in successfully, you can access your data directly from the Azure Data Lake with the help of Power Query.
With Dataflows in place, Power BI will automatically refresh your connection and ensure that you always have up-to-date data in your Azure Data Lake.
Step 3: Setting up Incremental Refresh for Dataflows
Power BI Dataflows’ Incremental Refresh functionality automatically updates the data stored in your Azure Data Lake Gen 2 periodically and replicates any new update made to your data source.
To set this up, click on the Dataflow option and then select the Incremental Refresh option found at the right end of your screen for every entity you’ve created.
The Incremental Refresh settings dialogue box will now open up on your screen as follows:
Provide the following information carefully:
- On/off Toggle: Toggle the switch to the on position to start the Incremental Refresh process.
- DateTime: Select the DateTime field of your entity based on which the data refresh will take place.
- Store Rows: Enter the period, starting from where you want to bring in data. For long periods, the first data refresh may take some time to occur.
- Refresh Rows: Enter the period, starting from where you want Dataflows to refresh your data periodically.
- Detect Data Changes: Checking this box causes Dataflows to check the days when the data updations took place at the source and refresh the data for that period only if required/ in case of any new updations.
This is how you can use New Entities to set up Power BI ETL using Dataflows.
Limitations of using New Entities for setting up Dataflows
- Using Dataflows requires mapping data into a standard Common Data Model format, making it challenging for companies that have unique mappings.
- There are row-level security issues at times.
- Incremental Refreshes don’t work with Dataflows that have Common Data Service as their desired destination. It further doesn’t work with entities that refer to Dataflows.
Method 2: Creating Dataflows using Linked Entities to set up Power BI ETL
Dataflows’ Linked Entity functionality allows users to refer to a previously created entity that might exist in any other Dataflow, thereby allowing users to reuse the data which already exists in the Azure Data Lake and hence build upon each other’s work. With Linked Entities in place, users can customize their ETL pipelines, by first creating multiple Dataflows, linking them together, and then using them to build their ETL pipeline.
The Linked Entities functionality is only available to users that have a Power BI Premium subscription.
Setting up Dataflows by adding Linked Entities
To set up Dataflows using a Linked Entity, select the workspace option, from the panel on the left. Click on the create option found at the top right corner of your screen and click on Dataflow.
The Dataflow settings page will now open on your screen, where you need to click on the add Linked Entities option as follows:
You can now bring in data from various existing entities such as Sales and Services and link them together to create a new Dataflow. These can then be analyzed using Power BI to create informative reports and dashboards and draw crucial insights. Dataflows allow users to independently refresh each of the linked Dataflows’ to ensure that they always have up-to-date data for analysis.
This is how you can use Linked Entities to set up Power BI ETL using Dataflows.
Limitation of using Linked Entities for setting up Dataflows
- Only users that have a Premium Power BI subscription can make use of the Linked Entities functionality.
Are you tired of following a tedious ETL setup process? Click here to check how Hevo, can you help you set up your ETL Pipelines for 150+sources in a hassle-free and completely automated manner.
Method 3: Creating Dataflows using Computed Entities to set up Power BI ETL
Using Computed Entities allows users to work with existing Linked Entities that are already present in the Azure Data Lake Gen 2. It lets you update or transform the already ingested data, making use of the Dataflows’ powerful Recalculation Engine, which takes care of all data & workspace dependencies. When you use a Computed Entity, it creates a new entity for the Dataflow, by performing a transformation on the Linked Entities in a read-only manner.
You can create a Computed Entity, by transforming a linked entity, either by using a merge operation or by creating a duplicate of the entity. Creating a duplicate entity allows you to perform multiple transformations and updations on the data. The Computed Entities functionality is only available to users that have a Power BI Premium subscription.
This can be implemented using the following steps:
Step 1: Signing in to your Power BI Service Account
Go to the official Power BI Service website and log in with your credentials such as your username and password.
Step 2: Transforming the Entity to a Computed Entity
Once you’ve logged in, go to the Dataflow authoring tool and click on the edit entities option. A list of all your Dataflow entities will now open up on your screen, select the entity that you want to use as the foundation of your Computed Entity.
Right-click on the desired entity, and a new will dialogue box, known as the context menu will now open up on your screen.
Ensure that the enable load option has a tick, as this is a must for the entity to be eligible. Click on the reference option. Once you’ve clicked on it, a new entity with a power sign will now appear, indicating that it is now a Computed Entity.
Any operation that you perform, such as summary, distinct counts, or transformation, will now happen on the data stored in your Azure Data Lake, rather than on the data source of your entity.
For example, if you want to combine raw data from Account and Service Call entities to draw crucial insights, you can do this using Computed Entities with ease.
To do this, you will have to aggregate the data on an attribute such as the account_id, and then create a new attribute to store the newly transformed data.
Once you’ve performed the desired operation on one of the entities, you can use the merge operation to combine data from the Account entity and “ServiceCallsAggregated” entity, using a SQL-based join, such as Left Join as follows:
Once you’ve selected the type of join, click on ok. You can now find the modified data in the newly created EnrichedAccount Computed Entity as follows:
This is how you can use Computed Entities to perform transformations on the data already residing in the Azure Data Lake and set up Power BI ETL using Dataflows.
Limitations of using Computed Entities for Setting up Dataflows
- Only users that have a Premium Power BI subscription can make use of the Computed Entities functionality.
- Computed Entities work harmoniously with Linked Entities, only when both of them are present in the Azure Data Lake, associated with the same account.
Method 4: Using a CDM Folder in Dataflows to set up Power BI ETL
Using CDM folders to create Dataflows allows users to refer to an entity with attributes that follow the Common Data Model format. These entities are created by some other application and are already present in Azure Data Lake.
To create a Dataflow using CDM folders, you must meet the following requirements:
- The user trying to create the Dataflow must have access to the account associated with Azure Data Lake.
- Power BI must have permission to access the file from Azure Data Lake.
- CDM Folder based Dataflows can be created only in new workspaces.
- The file path of the JSON files must have an Azure Data Lake Gen 2 endpoint.
Once you have all the necessary permissions, you can create the Dataflow by providing the complete URL or folder path for the CDM file stored in your Azure Data Lake, along with a name for your Dataflow.
This is how you can use CDM Folders to set up Power BI ETL using Dataflows.
Limitations of using CDM Folders for setting up Dataflows
- Using CDM Files requires users to have access to the account associated with Azure Data Lake Gen 2. This can be challenging, as many users trying to create a Dataflow, may or may not have access.
- Dataflows don’t support using “blob.core” endpoints while specifying the URL for the CDM File.
This article introduces you to the various methods that can be used to set up Power BI ETL with Dataflows. It also provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently. These methods, however, can be challenging, especially for a beginner & this is where Hevo saves the day.
Hevo Data, a No-Code Data Pipeline, helps you transfer data from a source of your choice in a fully-automated and secure manner without having to write the code repeatedly. Hevo with its strong integration with 150+ sources, allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiffy.
Visit our Website to Explore Hevo
Want to take Hevo for a spin?
Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand.
Tell us about your preferred way of using Dataflows to set up Power BI ETL! Let us know your thoughts in the comments section below.