Microsoft’s Power BI is a technology-driven Business Intelligence tool for analyzing and visualizing raw data in order to present actionable data. It brings together Business Analytics, Data Visualization, and best practices to assist organizations in making data-driven decisions. Because of the capabilities of the Power BI platform, Gartner named Microsoft the Leader in the “2019 Gartner Magic Quadrant for Analytics and Business Intelligence Platform” in February 2019.
Power BI Aggregations can speed up queries on very large DirectQuery datasets. You can cache data at the aggregated level in memory using aggregations.
This article talks about Power BI Aggregations in detail. It also describes the two types of aggregations and their critical aspects. A brief introduction to Power BI is also specified in this article.
What is Power BI?
Power BI is a proprietary Business Intelligence tool designed for seamless Data Analytics and Data Visualization. It is a part of the Microsoft Power Platform. Power BI is one of the common tools used by organizations for analyzing their business data construct reports.
It comes with a collection of various in-built software services, apps, and connectors that deeply integrates with data sources to deliver immersive visuals, interactive reports, and generate insights. Power BI uses its advanced charts, graphs, and other visuals included with Machine Learning to easily extract valuable information out of data.
Power BI is a great tool for performing Data Analytics and Visualization for your business data. However, at times, you need to transfer this data from multiple sources to your Power BI account for analysis.
Power BI Desktop is a self-service data analysis and reports authoring tool for Windows computers that you can download for free. It can connect to over 70 on-premises and cloud data sources to create interactive visuals from data. Power BI Desktop is used by data scientists and developers to create reports and make them available to the Power BI service.
Power BI can also read data from XML files, CSV files, JSON format files, and even web pages, then convert raw data into interactive insights. It is available for Desktop, mobile, and on-premise servers. Users can create and share their reports with other Power BI users within the organization or partner companies.
Key Features of Power BI
Some of the main features of Power BI are listed below:
- Supports API Integrations: Power BI allows developers to easily integrate with other applications and embed dashboards into other software using sample codes and APIs.
- Custom Visualization: Power BI offers custom visualization libraries support that allows users to visualize complex data with ease.
- AI Support: Users can easily perform Data Analytics using Artificial Intelligence. With the built-in AI support, users can prepare data, build Machine Learning models, and gain insights.
- Modeling View: With the help of Modeling View, Power BI users can slice and divide the complex data into simpler ones that help in better understanding the data, separate diagrams, and multi-select objects.
- Easy Sharing: Power BI makes it easier for users to easily share their reports within teams, or organizations ensuring full data protection.
- Hybrid Development: Power BI easily integrates with many 3rd party connectors, applications, and services widely used by organizations that allow users to connect to various data sources.
Understanding Power BI Aggregations
Power BI Aggregations: User-Defined Aggregations
Over very large DirectQuery datasets, Power BI Aggregations can improve query performance. You can cache aggregated data in memory by using Power BI aggregations. It allows you to manually configure Power BI aggregations in the data model.
Creating Power BI Aggregation Tables
- A Power BI aggregations table can be created as a table or view at the data source, as a native query, or as an import table created in Power Query, depending on the data source type. You then define aggregations for aggregation columns with summarization, detail table, and detail column properties in Power BI Desktop’s Manage aggregations dialog.
- Relationship-based Aggregations can be used in dimensional data sources such as warehouses and Data Marts. Power BI Aggregations are frequently based on GroupBy columns in Hadoop-based Big Data sources.
Manage Power BI Aggregations
- Right-click the Power BI Aggregations Table in the Fields pane of any Power BI Desktop view, then select Manage Aggregations from the menu.
- In the Manage Aggregations dialogue, you can specify the Power BI aggregations behavior for each column in the table. Queries to the Sales Detail table are internally redirected to the Sales Agg aggregation table in the following example.
- The GroupBy entries are optional in this relationship-based aggregation example. They have no effect on aggregation behavior and are primarily for readability, with the exception of DISTINCTCOUNT. The Power BI aggregations would still be hit without the GroupBy entries because of the relationships.
Validations
Validations are enforced in the Manage Aggregations dialogue:
- Except for the Count and Count Table Rows Summarization functions, the Detail Column must have the same datatype as the Aggregation Column. Only Integer Aggregation Columns support Count and Count Table rows, which don’t require a matching datatype.
- Chained Power BI aggregations of three or more tables are not permitted. Aggregations in Table A, for example, cannot refer to aggregations in Table B, which refer to aggregations in Table C.
- Duplicate Power BI aggregations aren’t allowed when two entries have the same Summarization Function and refer to the same Detail Table and Detail Column.
- The DirectQuery Storage mode, not Import, must be used in the Detail Table. It’s not possible to group by a foreign key column used by an inactive relationship and rely on the USERELATIONSHIP function for aggregation hits.
- Relationships between aggregation tables can be leveraged in Power BI aggregations based on GroupBy columns, but authoring relationships between aggregation tables are not supported in Power BI Desktop. If necessary, you can use a third-party tool or a scripting solution to create relationships between aggregation tables using XMLA endpoints. For communications between client applications and the engine that manages your Power BI workspaces and datasets, workspaces use the XML for Analysis (XMLA) protocol. These communications take place over XMLA Endpoints, as they’re known. The Microsoft Analysis Services engine, which runs Power BI’s Semantic Modeling, Governance, Lifecycle, and Data Management, uses the same XMLA Communication Protocol. The XMLA protocol encrypts all data sent over it.
The majority of validations are enforced by disabling dropdown values and displaying an explanation in the tooltip.
Aggregation Tables Hidden
Aggregation tables cannot be queried by users with read-only access to the dataset. When combined with row-level security, this eliminates Row Level Security concerns (RLS). Consumers and queries refer to the Detail table rather than the Aggregation table, and they don’t need to be aware of it.
As a result, aggregation tables are not visible in the Report View. When you select Apply All in the Manage Aggregations dialog, the table will be hidden if it isn’t already.
Storage Modes
- The table-level storage modes interact with the aggregation feature. DirectQuery, Import, and Dual storage modes are available for Power BI Tables.
- Import caches data in memory and sends queries to the cached data, whereas DirectQuery queries the backend directly. Power BI Aggregations are supported by all Power BI Import and non-multidimensional DirectQuery data sources.
- Select the Aggregated Table in the Power BI Desktop Model view and change the storage mode to Import to speed up queries. Expand Advanced in the Properties pane, then select Import from the drop-down menu under Storage mode. It’s important to note that this is an irreversible action.
RLS for Power BI Aggregations
- RLS Expressions must filter both the Aggregation Table and the Detail Table in order to work correctly for Power BI aggregations.
- Because Geography is on the filtering side of relationships to both the Sales table and the Sales Agg table, the RLS expression on the Geography table works for Power BI aggregations in the example below. RLS will be successfully applied to both queries that hit the Aggregation table and those that don’t.
- Only the detail Sales Table is filtered by an RLS expression on the Product table, not the aggregated Sales Agg table.
- Because the Aggregation table is a representation of the data in the Detail table, answering queries from the aggregation table would be insecure if the RLS filter could not be applied. Filtering only the detail table isn’t a good idea because aggregation hits won’t help user queries from this role.
- It’s not possible to use an RLS expression that only filters the Sales Agg Aggregation table and not the Sales Detail table.
- Because the detail table covers all the GroupBy columns in the Aggregation table, an RLS Expression applied to the detail table can be used to filter the Aggregation table.
- An RLS Filter on the Aggregation table, on the other hand, cannot be applied to the Detail table and is therefore forbidden.
Aggregation Based on Relationships
- Relationship-based Power BI aggregations are commonly used in dimensional models. The relationships between Dimension tables and Fact tables in Power BI datasets from Data Warehouses and Data Marts resemble Star/Snowflake schemas.
- The model in the following example uses only one data source. DirectQuery storage mode is used for tables. There are billions of rows in the Sales table. Caching would necessitate changing Sales’ storage mode to Import, which would consume a lot of memory and resources.
- Create a table called Sales Agg instead. The sum of SalesAmount grouped by CustomerKey, DateKey, and ProductSubcategoryKey determines the number of rows in the Sales Agg table. Because the Sales Agg table has a higher granularity than Sales, it could contain millions of rows instead of billions, making it much easier to manage.
- Sales Agg can be filtered using one-to-many or many-to-one relationships if the following dimension tables are the most commonly used for queries with high business value.
- Geography
- Customer
- Date
- Product Subcategory
- Product Category
This model can be seen in the image below.
- The Sales Agg table’s aggregations are shown in the table below.
- When the aggregated Sales Agg table’s storage mode is set to Import, a dialogue box appears, stating that the related Dimension tables can be set to storage mode Dual.
- Depending on the subquery, setting the related Dimension tables to Dual allows them to act as either Import or DirectQuery. Consider the following scenario:
- The in-memory cache can return queries that aggregate metrics from the Import-mode Sales Agg table and group by attribute(s) from the related Dual tables.
- DirectQuery mode can return queries that aggregate metrics from the DirectQuery Sales table and group by attribute(s) from the related Dual tables. The query logic is passed down to the source database, including the GroupBy operation.
Regular vs Limited Relationships
- Regular relationships are required for Power BI aggregations based on relationships.
- When both tables are from the same source, regular relationships include the following storage mode combinations:
Table on the many sides | Table on the 1 side |
Dual | Dual |
Import | Import or Dual |
DirectQuery | DirectQuery or Dual |
- Only when both tables are set to Import can a cross-source relationship be considered regular. Relationships between many people are always thought to be limited.
Relationship-Based Aggregation Query Examples
- Because the columns in the Date table are at the granularity that can hit the aggregation, the following query hits it. The sum aggregation is used in the SalesAmount column.
EVALUATE
SUMMARIZECOLUMNS
'Date'(CalendarYear),
"Sales”, SUM('Sales'[SalesAmount])
- The following query does not return any results from the aggregation. Despite the fact that the query requests the sum of SalesAmount, it performs a GroupBy operation on a column in the Product table that isn’t granular enough to hit the aggregation.
- A Product subcategory can have multiple Product rows if you look at the relationships in the model. The query would have no way of knowing which product to aggregate to. The query reverts to DirectQuery in this case and sends a SQL query to the data source.
EVALUATE
SUMMARIZECOLUMNS(
'Product'[EnglishProductName),
"Sales”, SUM('Sales'[SalesAmount]
)
- Power BI Aggregations aren’t just for quick sums. Calculations that are more complicated can also help. A complex calculation is conceptually broken down into subqueries for each SUM, MIN, MAX, and COUNT, with each subquery evaluated to see if it can hit the aggregation. Because of query-plan optimization, this logic may not hold true in all cases, but it should in general. The aggregation is demonstrated by the following example:
EVALUATE
SUMMARIZECOLUMNS (
'Date'[CalendarYear),
“Sales Amount/Count”, DIVIDE(SUM('Sales' [SalesAmount]), COUNTROWS('Sales'))
)
- Power BI Aggregations are beneficial to the COUNTROWS function. Because a Count Table rows aggregation has been defined for the Sales table, the following query hits the aggregation.
EVALUATE
SUMMARIZECOLUMNS(
'Date'(CalendarYear),
"Sales Count", COUNTROWS('Sales')
)
- Power BI Aggregations can be beneficial to the AVERAGE function. Because AVERAGE is internally folded to a SUM divided by a COUNT, the following query hits the aggregation. The aggregation is hit because the UnitPrice column has Power BI aggregations for both SUM and COUNT.
EVALUATE
SUMMARIZECOLUMNS(
'Date'[CalendarYear),
"Avg Unit Price", AVERAGE('Sales'[UnitPrice])
)
- Power BI Aggregations can help the DISTINCTCOUNT function in some cases. The following query uses the aggregation because CustomerKey has a GroupBy entry, which keeps CustomerKey distinct in the Aggregation table.
- This method may still hit the performance ceiling, where queries with more than two to five million distinct values can degrade query performance. It is, however, useful in scenarios where the detail table has billions of rows but the column has two to five million distinct values. In this case, even if the table is cached in memory, the DISTINCTCOUNT can outperform scanning the table with billions of rows.
EVALUATE
SUMMARIZECOLUMNS(
'Date'[CalendarYear),
"Avg Unit Price", AVERAGE('Sales'[UnitPrice])
)
- Aggregation is supported by DAX Time Intelligence functions. The DATESYTD function generates a table of CalendarDay values, and the Aggregation table is at a granularity that covers Group-by columns in the Date table, so the following query hits the aggregation. This is a table-valued filter for the CALCULATE function, which can work with Power BI aggregations.
EVALUATE
SUMMARIZECOLUMNS(
'Date'[Calendar Month],
'Product Category'[CategoryName],
"Sales", CALCULATE(SUM('Sales'[SalesAmount]), DATESYTD('Date'[CalendarDay]))
)
Aggregation Based on GroupBy Columns
- Dimensional models differ from Hadoop-based Big Data models in their characteristics. Big Data models frequently denormalize dimension attributes to fact tables instead of using relationships to avoid joins between large tables.
- Using Power BI aggregations based on GroupBy columns, such big data models can be unlocked for interactive analysis.
- The numeric column Movement to be aggregated is listed in the table below. The remaining columns are grouping attributes. The table has a large number of rows and IoT data in it. DirectQuery is the type of storage. Because of the sheer size of the dataset, queries on the data source that aggregate across it are slow.
- You can add an Aggregation table to this dataset to enable interactive analysis by grouping by most of the attributes but excluding the high-cardinality attributes like Longitude and Latitude. The number of rows is drastically reduced, and the data is small enough to fit into an in-memory cache.
- In the Manage Aggregations dialogue, you define the aggregation mappings for the Driver Activity Agg table.
- The GroupBy entries are required in Power BI aggregations based on GroupBy columns. The Power BI aggregations will not be hit if they are not included. The GroupBy entries are not required when using Power BI aggregations based on relationships.
- The aggregates for the Driver Activity Agg table are shown in the table below.
- The aggregated Driver Activity Agg table can be stored using the Import storage mode.
GroupBy Aggregation Query Example
- Because the Activity Date column is covered by the Aggregation table, the following query triggers aggregation. Count Table rows aggregation is used by the COUNTROWS function.
EVALUATE
SUMMARIZECOLUMNS(
'Driver Activity'[Activity Date),
"Location Count", COUNTROWS('Driver Activity')
)
- It’s a good idea to use Count Table rows Power BI aggregations, especially for models that have filter attributes in Fact tables. In cases where COUNTROWS is not explicitly requested by the user, Power BI may submit queries to the dataset using it. The count of rows for each value, for example, is displayed in the filter dialogue.
Combined Aggregation Techniques
- Power BI Aggregations can be created using a combination of relationships and GroupBy columns. Denormalized Dimension tables may need to be split into multiple tables for Power BI aggregations based on relationships. If this is too expensive or inconvenient for some dimension tables, you can duplicate the required attributes in the aggregation table for those dimensions and rely on relationships for others.
- Month, Quarter, Semester, and Year, for example, are replicated in the Sales Agg table by the following model. Sales Agg and the Date table have no relationship, but Customer and Product Subcategory do. Sales Agg uses the Import storage mode.
- The Manage Aggregations dialogue for the Sales Agg table has the following entries. To hit Power BI aggregations for queries that are grouped by the Date attributes, the GroupBy entries where Date is the Detail table must be filled in. Because of the presence of relationships, the GroupBy entries for CustomerKey and ProductSubcategoryKey have no impact on aggregation hits, with the exception of DISTINCTCOUNT.
Combined Aggregation Query Examples
- The aggregation table covers CalendarMonth, and CategoryName is accessible via one-to-many relationships, so the following query hits the aggregation. The SUM aggregation is applied to SalesAmount.
EVALUATE
SUMMARIZECOLUMNS(
'Date'[Calendar Month],
'Product Category'[CategoryName],
"Sales”, SUM('Sales'[SalesAmount])
)
- Because the aggregation table does not include CalendarDay, the following query will not return results.
EVALUATE
SUMMARIZECOLUMNS(
'Date'[CalendarDay],
'Product Category'[CategoryName],
"Sales”, SUM('Sales' [SalesAmount])
)
- Because the DATESYTD function generates a table of CalendarDay values, the aggregation table does not cover CalendarDay, and the following Time-Intelligence Query does not reach the aggregation.
EVALUATE
SUMMARIZECOLUMNS(
'Date'[Calendar Month],
'Product Category'[CategoryName],
"Sales", CALCULATE(SUM('Sales'[SalesAmount]), DATESYTD('Date'[CalendarDay]))
)
Aggregation Precedence
- Multiple aggregation tables can be considered by a single subquery thanks to aggregation precedence.
- A composite model with multiple sources is shown in the following example:
- Over a trillion rows of IoT data from a Big-Data system are stored in the Driver Activity DirectQuery table. It provides drillthrough queries for viewing individual IoT readings in filter contexts that can be controlled.
- In DirectQuery mode, the Driver Activity Agg table is an intermediate aggregation table. Azure Synapse Analytics (formerly SQL Data Warehouse), has over a billion rows and is optimized at the source with columnstore indexes.
- Because the group-by attributes are few and have low cardinality, the Agg2 Import table has a high granularity. The number of rows could be in the thousands, allowing it to fit into an in-memory cache with ease. Because a high-profile executive dashboard uses these attributes, queries referencing them should be as quick as possible.
- DirectQuery aggregation tables with a different data source than the detail table can only be used in the aggregation table is from a SQL Server, Azure SQL, or Azure Synapse Analytics (formerly SQL Data Warehouse) source.
- This model has a small memory footprint, but it unlocks a massive dataset. It is a balanced architecture because it distributes the query load across the architecture’s components, utilizing them according to their strengths.
- The Precedence field for Driver Activity Agg2 is set to 10, which is higher than for Driver Activity Agg, in the Manage Aggregations dialogue.
- Queries that use Power BI aggregations will consider Driver Activity Agg2 first because of the higher precedence setting. Driver Activity Agg will be used instead of Driver Activity Agg2 for subqueries that aren’t at the granularity that can be answered by it. Driver Activity will be used to answer detailed queries that neither aggregation table can answer.
- Because chained Power BI aggregations are not permitted, the table specified in the Detail Table column is Driver Activity rather than Driver Activity Agg.
- The Power BI aggregations for the Driver Activity Agg2 table are shown in the table below.
Detect if Queries Hit or Miss Power BI Aggregations
SQL Profiler can tell if a query is returned from the in-memory cache storage engine or pushed to the data source by DirectQuery. The same method can be used to determine whether or not Power BI aggregations have been hit.
Query ProcessingAggregate Table Rewrite Query is another extended event available in SQL Profiler.
An example of the event output when aggregation is used is shown in the JSON snippet below.
- The subquery used an aggregation, as shown by the matchingResult.
- The Group is displayed in dataRequest.
- The subquery was used by column(s) and aggregated column(s).
- Mapping displays the mapped columns in the Aggregation table.
{
"table": "Sales",
"mapping": {
"table": "Sales Agg"
},
"matching Result":"matchFound”,
"dataRequest":[
{
"table": "Date",
"column": "CalendarYear",
"mapping": {
"table": "Date",
"column": "CalendarYear"
}
},
{
"aggregation": "sum",
"table": "Sales”,
"column": "SalesAmount",
"mapping": {
"column": "SalesAmount_Sum"
}
}
]
}
Keep Caches in Sync
- If the in-memory cache is not kept in sync with the source data, aggregates that combine DirectQuery, Import, and/or Dual storage modes may return different data.
- Filtering DirectQuery results to match cached values, for example, will not be used to mask data issues during query execution. If necessary, there are techniques in place to deal with such problems at the source.
- Only use performance optimizations if they do not jeopardize your ability to meet business requirements. It is your responsibility to understand and design your data flows.
Considerations and Limitations
Because security controls cannot be enforced, Power BI will no longer be able to calculate Power BI aggregations for tables with an SSO-enabled data source as of July 2022 due to changes in functionality. It’s recommended that you turn off SSO for these data sources to get the best query performance with Power BI aggregations.
Power BI Aggregations: Automatic Aggregations
Automatic Aggregations continuously optimize DirectQuery datasets for maximum report query performance using cutting-edge Machine Learning (ML). Automatic aggregations are built on top of the User-Defined aggregations infrastructure that was first introduced with Power BI composite models.
Automatic aggregations, unlike User-Defined aggregations, do not necessitate extensive Data Modeling and Query Optimization skills to set up and maintain. Automatic aggregations are Self-teaching and Self-optimizing at the same time. They allow dataset owners of any skill level to improve query performance, allowing even the largest datasets to have faster report visualizations.
With automatic aggregations, you’ll be able to:
- Report Visualizations Quickly: Instead of backend data source systems, an automatically maintained in-memory Power BI aggregations cache returns an optimal percentage of report queries. DirectQuery is used to pass outlier queries that cannot be answered by the in-memory cache directly to the data source.
- Scalability: When compared to pure DirectQuery mode, the Power BI query engine and in-memory Power BI aggregations cache return the majority of query results. At peak reporting times, the query processing load on data source systems can be significantly reduced, resulting in increased scalability in the data source backend.
- Setup is Simple: Dataset owners can enable automatic aggregations training and schedule one or more dataset refreshes. Automatic aggregations start building a Power BI aggregations framework and optimal aggregations with the first training and refresh. Over time, the system adjusts itself automatically.
- Fine-tuning: You can estimate the performance gains for a different percentage of queries returned from the in-memory Power BI aggregations cache and make adjustments for even greater gains using a simple and intuitive user interface in the dataset settings. A single slide bar control makes fine-tuning for your environment simple.
Requirements
Supported plans
Power BI Premium per Capacity, Premium per User, and Power BI Embedded Datasets are all supported by Automatic aggregations.
Supported Data Sources
For the following data sources, Automatic aggregations are available:
Supported Modes
For DirectQuery mode datasets, automatic aggregations are possible. Both Import tables and DirectQuery connections can be used to create Composite Model datasets, but only the DirectQuery connection supports automatic aggregations.
Permissions
You must own the Dataset in order to enable and configure Automatic aggregations. To configure Automatic Aggregation settings, workspace admins can take over as owners of a dataset.
Configuring Automatic Aggregations
In the dataset Settings, you can set up automatic aggregations. Enable Automatic Aggregations training and schedule one or more refreshes to get started. It gives you a good understanding of how Automatic aggregations work and can help you decide if they’re right for your situation.
See Configure automatic aggregations for step-by-step instructions on how to enable Automatic aggregations training, configure a refresh schedule, and fine-tune for your environment.
Benefits
- When a dataset user opens a report or interacts with a report visualization, DAX queries are passed to the query engine, which then sends SQL queries to the backend data source. Each query must then be calculated and returned by the data source.
- DirectQuery data source round trips, when compared to in-memory import mode datasets, can be time and process-intensive, resulting in slow query response times in report visualizations.
- Automatic aggregations can improve report query performance by avoiding data source query round trips when they are enabled for a DirectQuery dataset. Instead of being sent to and returned by the data source, pre-aggregated query results are returned automatically by an in-memory Power BI aggregations cache.
- The amount of pre-aggregated data in the in-memory aggregations cache is a tiny fraction of the data stored in the data source’s Fact and Detail tables. Not only is report query performance improved, but the load on backend data source systems is reduced as well.
- Only a small portion of the report and ad-hoc queries that require aggregations not included in the in-memory cache is passed to the backend data source when using Automatic aggregations, just like when using pure DirectQuery mode.
Automatic Query and Aggregations Management
While Automatic aggregations eliminate the need for User-Defined Aggregations tables and make implementing a pre-aggregated data solution significantly easier, understanding how automatic aggregations work requires a deeper understanding of the underlying processes and dependencies. To create and manage Automatic aggregations in Power BI, you’ll need the following.
Query Log
In a Query Log, Power BI keeps track of dataset and user report queries. Power BI keeps seven days of query log data for each dataset. Each day, the query log data is rolled forward. The query log is password-protected and cannot be accessed by users or via the XMLA endpoint.
Training Operations
- Power BI performs a training operation on the query log as part of the first scheduled dataset refresh operation for your selected frequency (Day or Week). This ensures that aggregations in the in-memory aggregations cache adapt to changing query patterns.
- Special queries are sent to the data source to determine which aggregations to include in the cache, and in-memory Aggregations tables are Created, Updated, or Dropped. Calculated Aggregations Data, on the other hand, is loaded into the in-memory cache during refresh rather than during training.
- If you choose the Day frequency and schedule refreshes at 4:00 AM, 9:00 AM, 2:00 PM, and 7:00 PM, only the 4:00 AM refresh will include both a training and a refresh operation on that day. The scheduled refreshes for that day at 9:00 a.m., 2:00 p.m., and 7:00 p.m. are only refresh operations that update the cache’s existing aggregations.
- While training operations use the query log to evaluate past queries, the results are accurate enough to ensure that future queries are covered. Future queries may differ from those derived from the query log, so there is no guarantee that the in-memory aggregations cache will return them.
- DirectQuery is used to pass queries to the data source that were not returned by the in-memory aggregations cache. Aggregations for those new queries may be added to the in-memory aggregations cache with the next training operation, depending on the frequency and ranking of those new queries.
- There is a 60-minute time limit for the training operation. A notification is logged in the dataset Refresh History if training is unable to process the entire query log within the time limit. Training resumes the next time it is launched. When the entire query log is processed, the training cycle finishes and replaces the existing automatic aggregations.
Refresh Operations
- After the training operation completes as part of the first scheduled refresh for your selected frequency, Power BI runs a Refresh operation that queries and loads new and updated aggregations data into the in-memory aggregations cache and removes any aggregations that no longer rank high enough, as described above (as determined by the training algorithm).
- All subsequent refreshes for the Day or Week frequency you selected are Refresh-only operations that query the data source to update existing aggregations data in the cache.
- The scheduled refreshes for that day, in our example, are refresh-only operations at 9:00 AM, 2:00 PM, and 7:00 PM. After the training operation completes as part of the first scheduled refresh for your selected frequency, Power BI runs a Refresh operation that queries and loads new and updated aggregations data into the in-memory aggregations cache and removes any aggregations that no longer rank high enough, as described above (as determined by the training algorithm).
- All subsequent refreshes for the Day or Week frequency you selected are Refresh-only operations that query the data source to update existing aggregations data in the cache. The scheduled refreshes for that day, in our example, are refresh-only operations at 9:00 AM, 2:00 PM, and 7:00 PM.
Training on Demand
A Training Cycle may not finish within the time constraints of a single Data Refresh cycle, as previously stated. You can trigger automatic aggregations training on-demand by clicking Train and Refresh Now in Dataset Settings if you don’t want to wait for the next scheduled refresh cycle that includes training.
When you use Train and Refresh Now, you’ll get a refresh as well as training. Before running an additional on-demand training and refresh operation, check the Dataset Refresh history to see if the current operation has been completed.
Refresh History
The Dataset Refresh history keeps track of each Refresh operation. Important information about each refresh is displayed, such as how much memory the cache’s aggregations consume for the configured query percentage. On the Dataset Settings page, click Refresh History to see the refresh history. Click Show Details if you want to dig a little deeper.
You can ensure that your scheduled refresh operations are completed within an acceptable time frame by checking your refresh history on a regular basis. Before the next scheduled refresh begins, double-check that all refresh operations have been completed successfully.
Training and Refresh Failures
- While Power BI performs Training and Refresh Operations as part of the first scheduled dataset refresh for the Day or Week frequency you specify, these operations are handled separately. If a training operation runs out of time to fully process the query log, Power BI will use the previous training state to refresh the existing aggregations (and regular tables in a composite model).
- The Refresh History, in this case, will show that the refresh was successful, and training will resume processing the query log the next time training launches.
- If client report query patterns changed and aggregations didn’t adjust yet, query performance might be less optimized, but the achieved level of performance should still be far better than a pure DirectQuery dataset with no aggregations.
- Reduce the percentage of queries that use the in-memory aggregations cache in Dataset Settings if a Training Operation takes too many cycles to finish processing the query log.
- The number of aggregations created in the cache will be reduced, but Training and Refresh Operations will have more time to complete.
- Because the result is an unavailable in-memory aggregations cache if training succeeds but refresh fails, the entire dataset refresh is marked as Failed.
You can specify email notifications in case of refresh failures when scheduling refresh.
Query Caching and Automatic Aggregations
- Query Caching is also supported in Power BI Premium/Embedded to keep query results up to date. Automatic aggregations are not the same as query caching.
- Power BI Premium implements query caching using its local caching service, whereas automatic aggregations are implemented at the Dataset level. Because query caching only caches queries for the first-page load of a report, query performance isn’t improved when users interact with the report.
- Automatic aggregations, on the other hand, optimize most report queries by caching aggregated query results in advance, including those generated when users interact with reports. It’s possible to enable query caching and automatic aggregations for a dataset, but it’s unlikely to be necessary.
Monitor with Azure Log Analytics
- Power BI can save activity logs using Azure Log Analytics (LA), which is a service within Azure Monitor. You can collect, analyze, and act on telemetry data from Azure and on-premises environments using the Azure Monitor Suite. It comes with Long-term Storage, an Ad-hoc Query Interface, and API Access for data export and integration.
- You can evaluate the success rate of your automatic aggregations if Power BI is configured with an Azure LA account. You can check whether report queries are answered from the in-memory cache, among other things.
- Download the PBIT template from here and connect it to your log analytics account to use this feature. Data can be viewed at three different levels in the report: summary, DAX Query Level view, and SQL Query Level view.
- The summary page for all of the queries is shown in the image below. As you can see, the marked chart depicts the percentage of total queries satisfied by aggregations vs. those that required the use of a data source.
- The use of aggregations at the DAX Query Level is the next step in delving deeper. Drill through > Query History by right-clicking a DAX Query from the list (bottom left).
- This will give you a list of all the relevant questions. To see more aggregation details, drill down to the next level.
Application Lifecycle Management
Datasets with automatic aggregations enable unique requirements for ALM Solutions from development to test and from test to production.
Deployment Pipelines
Power BI can copy datasets and dataset configuration from the current stage to the target stage when using deployment pipelines. However, because the settings do not transfer from the current to the target stage, automatic aggregations must be reset in the target stage. The deployment pipelines’ REST APIs can also be used to programmatically deploy content.
Custom ALM Solutions
Keep in mind that if you use a Custom ALM Solution based on XMLA endpoints, your solution may be able to copy both system-generated and user-created aggregations tables as part of the Dataset Metadata. However, after each deployment step at the target stage, you must manually enable automatic aggregations. If you overwrite an existing dataset, Power BI will remember your settings.
Altering a Dataset
When adding or removing tables from a dataset with automatic aggregations enabled via XMLA endpoints, Power BI preserves any existing aggregations that can be preserved and remove those that are no longer needed or relevant. The performance of queries may be hampered until the next training phase is initiated.
Metadata Elements
- Unique system-generated aggregations tables are included in datasets with automatic aggregations enabled. Users can’t see Aggregations tables in Reporting tools. They can be seen using tools that use Analysis Services Client Libraries Version 19.22.5 or higher and the XMLA endpoint.
- Upgrade your data modeling and administration tools to the latest version of the client libraries when working with datasets that have automatic aggregations enabled. Upgrade to Version 18.9.2 or higher of SQL Server Management Studio (SSMS). These datasets can’t be enumerated or scripted out in earlier versions of SSMS.
- A SystemManaged table property in the Tabular Object Model (TOM) in Analysis Services Client Libraries Version 19.22.5 and higher is used to identify automatic aggregations tables. The SystemManaged property for automatic aggregations tables is set to true in the following code snippet, while it is set to false for regular tables.
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.AnalysisServices.Tabular;
namespace AutoAggs
{
class Program
{
static void Main(string[] args)
{
string workspaceUri = "<Specify the URL of the workspace where your dataset resides>";
string datasetName = "<Specify the name of your dataset>";
Server sourceWorkspace = new Server();
sourceWorkspace.Connect(workspaceUri);
Database dataset = sourceWorkspace.Databases.GetByName(datasetName);
// Enumerate system-managed tables.
IEnumerable<Table> aggregationsTables = dataset.Model.Tables.Where(tbl => tbl.SystemManaged == true);
if (aggregationsTables.Any())
{
Console.WriteLine("The following auto aggs tables exist in this dataset:");
foreach (Table table in aggregationsTables)
{
Console.WriteLine($"t{table.Name}");
}
}
else
{
Console.WriteLine($"This dataset has no auto aggs tables.");
}
Console.WriteLine("nrPress [Enter] to exit the sample app...");
Console.ReadLine();
}
}
}
- When you run this snippet, it creates a console with all of the automatic aggregations tables that are currently in the dataset.
- Keep in mind that as training operations determine which aggregations to include in the in-memory aggregations cache, aggregations tables are constantly changing.
- Power BI fully manages System-generated Table objects for automatic aggregations. Do not alter or delete these tables on your own. This may result in a decrease in performance.
- Outside of the dataset, Power BI keeps the dataset configuration. The presence of a system-managed aggregations table in a dataset does not imply that it is ready for automatic aggregation training.
- To put it another way, if you script out a full model definition for a dataset with automatic aggregations enabled and then create a new copy of the dataset (with a different name/workspace/capacity), the new dataset will not be ready for automatic aggregations training. In Dataset Settings, you must still turn on automatic aggregations training for the new dataset.
Considerations and Limitations
Keep the following in mind when utilizing Automatic aggregations:
- The SQL queries generated during the initial training phase can cause the Data Warehouse to experience a significant load. Consider temporarily scaling up your Data Warehouse to meet the training demand if training keeps finishing incompletely and you can verify that the queries are encountering a timeout on the Data Warehouse side.
- The most recent data at the data source may not be used to calculate aggregates stored in the in-memory aggregations cache. There is a latency between data source updates and aggregations data stored in the in-memory aggregations cache, which is unlike pure DirectQuery and more like regular import tables. While some latency will always exist, it can be reduced with a good refresh schedule.
- Set all dimension tables to Dual mode and leave fact tables in DirectQuery mode to improve performance even more.
- Power BI Pro, Azure Analysis Services, and SQL Server Analysis Services do not support automatic aggregations.
- Downloading datasets with automatic aggregations enabled is not supported by Power BI. You can no longer download a Power BI Desktop (.pbix) file if you uploaded or published it to Power BI and then enabled automatic aggregations. Make a backup of the PBIX file on your computer.
- In Azure Synapse Analytics, automatic aggregations with external tables are not yet supported. The following SQL query can be used to enumerate external tables in Synapse:
SELECT SCHEMA NAME(schema id) FROM SCHEMA NAME(schema id) FROM sys.external tables, name AS schema name, name AS table name
- Only datasets with enhanced metadata are eligible for automatic aggregation. If you want to use automatic aggregations with an older dataset, you must first upgrade it to enhanced metadata.
- If the DirectQuery data source is configured for single sign-on and uses dynamic data views or security controls to limit the data a user can access, disable automatic aggregations.
- Because automatic aggregations are unaware of these Data Source-level controls, it is impossible to ensure that accurate data is provided to each individual user. In the refresh history, Training will log a warning that it detected a data source configured for single sign-on and skipped the tables that use this data source. Disable SSO for these data sources if possible to benefit from the improved query performance that Automatic aggregations can provide.
- To avoid unnecessary processing overhead, disable automatic aggregations if the dataset only contains hybrid tables. Import Partitions and a DirectQuery partition are both used in a hybrid table. Incremental refresh with real-time data is a common scenario in which a DirectQuery partition retrieves transactions from the data source that occurred after the last data refresh. During refresh, Power BI, on the other hand, imports aggregations. As a result, transactions that occurred after the last data refresh cannot be included in automatic aggregations. Training will record a warning in the refresh history that hybrid tables were detected and skipped.
- For automatic aggregations, calculated columns are ignored. The corresponding report queries will not hit the in-memory aggregations cache if you use a calculated column in DirectQuery mode, such as by using the COMBINEVALUES DAX function to create a relationship based on multiple columns from two DirectQuery tables.
- Only the Power BI service supports automatic aggregations. System-generated aggregations tables are not created by Power BI Desktop.
- If you make changes to a dataset’s metadata while automatic aggregations are enabled, query performance may suffer until the next training process is started. As a best practice, disable automatic aggregations, make the necessary changes, and then retrain.
- Unless you’ve disabled automatic aggregations and are cleaning up the dataset, don’t change or delete system-generated aggregations tables. The system is in charge of these objects’ management.
Conclusion
This article discusses Power BI Aggregations extensively. User-Defined Aggregations and Automatic Aggregation are thoroughly explained in this blog. In addition to that, it also gives a brief overview of Power BI and its Key Features.
This platform allows you to transfer data from 100+ sources (including 40+ Free Sources) to BI tools, and Cloud-based Data Warehouses like Snowflake, Google BigQuery, Amazon Redshift, etc. It will provide you with a hassle-free experience and make your work life much easier.
Harshitha is a dedicated data analysis fanatic with a strong passion for data, software architecture, and technical writing. Her commitment to advancing the field motivates her to produce comprehensive articles on a wide range of topics within the data industry.