How to Optimize Power BI Data Models for Large Datasets

Table of Contents

1. Use a Large Public Dataset for Testing
2. Reduce Model Size by Removing Unused Data
3. Choose the Right Storage Mode (Import vs. DirectQuery vs. Dual)
4. Implement Aggregations for Better Query Performance
5. Use Incremental Refresh for Large Datasets
6. Apply Compression and Encoding Optimization
7. Analyze the Power BI Data Model with VertiPaq Analyzer & Power Query Diagnostics
- 7.1 VertiPaq Analyzer – Inspecting Compression & Memory Usage
- 7.2 Power Query Diagnostics – Ensuring Query Folding & Efficient Refresh
8. Case Study: Optimizing a Large Power BI Data Model – Before & After
Power BI Data Models Optimization for Large Datasets: Next Steps

Handling massive data in Power BI requires more than a good star schema – it demands careful optimization. This third post of our “Power BI Performance Optimization” series walks through advanced techniques to optimize Power BI data models at scale. We’ll demonstrate each step using a large public dataset so you can follow along. We assume you’re already comfortable with basic modeling (facts/dimensions) and focus on high-impact optimizations for performance. By the end, you’ll see how applying these techniques can dramatically reduce model size, speed up refreshes, and improve report responsiveness. (Dataset used: NYC Yellow Taxi Trips – a public dataset with millions of records.)

Explore B EYE’s Power BI Services

Flowchart illustrating seven key strategies to optimize Power BI data models, including storage mode selection, incremental refresh, and query performance improvements.

1. Use a Large Public Dataset for Testing

To practice these optimizations, start with a sizable open dataset. In this guide, we use the NYC Yellow Taxi Trip Data, which contains millions of taxi ride records. You can download a chunk of this data (e.g. one year’s worth of trips) from the NYC Open Data portal or other public data repositories. Load it into Power BI Desktop as your fact table. Add any relevant dimension tables (e.g. Date, Taxi Zones, etc.) or use existing lookup info in the dataset. This large dataset will serve as our playground for applying and measuring the impact of various optimization techniques.

Why this dataset? Its size and detail (e.g. every taxi ride with timestamps, locations, fares) will illustrate how model tweaks affect performance. Feel free to use a different big dataset if you prefer (for example, a multi-million row sales table). The key is to have a substantial volume of data to see the improvements from each step.

2. Reduce Model Size by Removing Unused Data

One of the first optimizations is data reduction – trim anything not needed for analysis. Power BI’s VertiPaq engine already compresses data (~10x is common, so 10 GB raw might be ~1 GB in memory), but you should still minimize what you load. Removing unnecessary data lowers memory usage and boosts efficiency.

Diagram showcasing vertical filtering (removing unused columns) and horizontal filtering (excluding unnecessary rows) to optimize Power BI model size.

Remove Unused Columns (Vertical Filtering): Go through each table and remove columns that aren’t used for reporting or model logic. Every column costs memory, especially high-cardinality text fields. According to Microsoft’s guidance, any column that doesn’t serve a reporting or relationship purpose is a candidate for removal. For example, drop audit fields, GUIDs, or redundant textual info if your reports don’t need them. Designing the model with exactly the columns required (and no more) keeps it lean. It’s easier to add a column later than to remove it in a broken report, so start with a minimal set.
Remove Unneeded Rows (Horizontal Filtering): Likewise, limit the rows [SA1] you import. It’s a great idea to define the granularity of the reporting. If there is no need of detailed reporting on daily basis, for example, we can load directly the aggregate data from the beginning. So, if historical data beyond a certain period isn’t needed, filter it out in Power Query. Microsoft calls this horizontal filtering. For instance, you might keep only the last 5 years of data instead of 10, especially if older data isn’t actively used in analysis. Don’t load all history “just in case” – it can bloat your model with little benefit. You can always adjust the date range parameters to pull more history if requirements change. Similarly, if the dataset contains multiple categories or regions but your report focuses on one, consider filtering to that subset (this can also simplify security if it negates the need for row-level security on regions).

By removing excess columns and rows, you shrink the model size and memory footprint. This yields multiple benefits: smaller models refresh faster and put less pressure on capacity resources, and queries run quicker on fewer rows. In our taxi data example, you might remove columns like taxi IDs or unused fare breakdown fields, and perhaps filter out trips from very old years that aren’t needed. These simple trims do wonders for performance, often cutting model size dramatically with zero impact on the actual analysis.

You May Also Like: Identify Power BI Performance Bottlenecks Using the Performance Analyzer: A Step-by-Step Tutorial

3. Choose the Right Storage Mode (Import vs. DirectQuery vs. Dual)

Power BI offers different storage modes for tables – understanding and choosing the right mode is critical for large datasets:

Table listing four Power BI storage modes: Import Mode, Dual Mode, DirectQuery Mode, and Direct Lake Storage, each with distinct performance benefits.

Import Mode: This fully loads the data into Power BI’s memory (VertiPaq). It usually gives the fastest query performance since everything’s in-memory. For example, slicing a visual by trip distance or fare will be very fast if the data is imported. However, import mode uses more memory and requires refreshes to pull updated data. Very large tables might not fit in memory if entirely imported (Pro has a ~1 GB dataset limit, Premium allows larger). Use import for tables that are small enough or require super-fast aggregation queries.
DirectQuery Mode: In DirectQuery, data stays in the source (e.g. SQL database) and queries are sent to the source on the fly. The Power BI model only stores metadata. This drastically reduces the Power BI model size – a DirectQuery table consumes no cache memory in Power BI . It enables analysis on huge datasets that would be impractical to fully import. The trade-off is query performance: each interaction triggers a live query to the backend, so performance depends on the source’s speed and network. It’s best used when data is too large to import or needs near real-time updates (for example, if you need up-to-the-minute taxi data). Be mindful of potential performance bottlenecks and source load, and consider aggregations (next section) to mitigate slow queries.
Dual Mode: Dual is a hybrid setting (available for tables in composite models) where a table can act as either Import or DirectQuery depending on the context. A Dual table is cached in memory and can propagate filters to DirectQuery tables. This mode is typically used for dimension tables in a mixed model. For instance, if you keep your huge fact table in DirectQuery but import your Date and Location dimension tables, setting those dimensions to Dual lets them serve cached data for their own fields and also join with the DirectQuery fact on-demand. Dual mode improves performance in composite models by reducing the number of cross-source join operations (which are limited). In short, Dual dimensions give you the best of both – quick slicers on those fields (from in-memory data) and the ability to filter DirectQuery fact data. Use Dual carefully and only on tables that make sense to cache; it’s a performance tweak that should not compromise data correctness.
Direct Lake Storage Mode: Direct Lake functions as a storage mode variant for Power BI semantic model tables housed within a Microsoft Fabric workspace. Its design is optimized to facilitate rapid in-memory loading of substantial data volumes sourced from Delta tables. These Delta tables, in turn, archive their data as Parquet files within OneLake—the unified storage repository for all analytics data. Once data populates memory, the semantic model empowers high-performance querying.

Choosing wisely: For our example, suppose the taxi fact table is extremely large (hundreds of millions of rows). One approach is a composite model: keep an aggregated version of the data in Import and the detailed trip records in DirectQuery. Important lookup tables (Date, Taxi Zone) can be Dual. This way summary queries hit the fast in-memory data, while drill-downs to detail query the source on demand. The right storage mode mix depends on your use case: aim to cache what you feasibly can for speed, and DirectQuery the rest. Power BI lets you set storage mode per table, so you have granular control.

4. Implement Aggregations for Better Query Performance

Even with DirectQuery, you can achieve lightning-fast queries on large data by using aggregation tables. Aggregations in Power BI allow you to pre-summarize big fact tables into smaller ones, and have Power BI automatically use those summaries for high-level queries.

How it works: You create a new table that is a grouped/aggregated version of your fact data (e.g. total trips and revenue per year/month, or per zone). This aggregation table is typically loaded in import mode (in-memory). You then define it as an aggregation of the big fact table using Manage Aggregations in Power BI. When a user runs a query that can be answered by the summary (for example, total rides by month), Power BI’s engine will hit the aggregation table instead of the detailed table. The result: vastly improved query performance, because scanning a few thousand aggregated rows in memory is much faster than scanning a hundred million raw rows via DirectQuery.

For instance, our taxi dataset might have an aggregation table at the daily level: number of trips, average fare, etc. per day. This table might be only tens of thousands of rows (one per date) instead of the full trip table. With aggregations configured, a report visual showing “trips by month” would automatically use the in-memory daily agg, accelerating the query. Only when someone drills down to a very granular level (say, individual rides on a specific date) would Power BI revert to querying the DirectQuery fact table.

Microsoft documentation notes that aggregations improve query performance over large DirectQuery models by caching data at an aggregated level. Essentially, you trade some disk/memory space for these summary tables in order to offload work from the big DirectQuery source. This can also reduce load on your database because most high-level queries never hit it at all – they’re served from the Power BI cache. When setting this up, ensure your detail table is in DirectQuery (aggregations only work when the detail is DirectQuery or dual) and the aggregated table is imported. Configure the aggregation mappings (Sum of X, Group by Y, etc.) using the Manage Aggregations dialog.

Tip: Think about user behavior – if 90% of queries are at a higher level (e.g. trends by month or by region), aggregations can drastically speed up the experience. One report showed that using aggregation tables for a fact table with billions of rows not only avoided storage constraints, it significantly sped up DAX measures since most queries hit the pre-aggregated cache. By implementing aggregations in our taxi model, slicing the data by year or by taxi zone becomes instantaneous, while still allowing detail exploration when needed.

Infographic showing best practices for speeding up Power BI queries, including using aggregations, optimizing models, and choosing efficient storage solutions.

5. Use Incremental Refresh for Large Datasets

Regularly refreshing a huge dataset can be time-consuming and resource-intensive. Incremental refresh addresses this by only refreshing the data that has changed (new or updated records) instead of reloading everything each time. It’s a must-have for large fact tables that grow over time (like daily transaction or log data).

With incremental refresh, you define a policy (typically using date/time parameters) to partition your fact table by time – for example, split by month or year. When you publish to the service, the data is stored in separate partitions under the hood. On each refresh, Power BI only fetches new data for recent partitions (and optionally, updates a rolling window). Historical partitions are left as-is. This means refreshes are much faster and use less memory, because you’re not reprocessing years of unchanged data.

Key benefits of incremental refresh include:

Much faster refresh cycles: Since only the most recent data is pulled, refresh duration drops significantly. “Refreshes are faster. Only the most recent data that has changed needs to be refreshed.”
Lower resource usage: Less data to import means less memory and CPU used during refresh. This reduces load on both the Power BI service and the data source.
Supports huge datasets: By partitioning the data, Power BI can handle models with hundreds of millions or billions of rows without full reloads. Each partition refresh is manageable, allowing the overall dataset to grow much larger than it could otherwise.
Reliability and real-time: Shorter refresh jobs are less likely to time out or fail. You can even enable a hybrid table with an extra DirectQuery partition for real-time data on top of the incremental import ones, achieving near-real-time reporting without sacrificing performance on historical data.

Checklist detailing the advantages of incremental refresh in Power BI, such as faster refresh cycles, lower resource usage, and real-time reliability for large datasets.

To set up incremental refresh on the taxi dataset (assuming it’s date-based), you’d do the following: In Power Query, define two parameters RangeStart and RangeEnd (DateTime type). Keep in mind that the parameters are case sensitive. Filter the fact table’s date column using these parameters (e.g. Date >= RangeStart and < RangeEnd). In Power BI Desktop, go to the table’s Incremental Refresh settings and define how many periods to store and how many to refresh. For example, keep 5 years of data, refresh only the last 1 month. Publish to a workspace (Note: Incremental refresh requires Power BI Pro or Premium per user, and to be published to the service to take effect). Once set, your refreshes will only pull, say, the latest month of taxi trips each day, instead of reloading all 5 years.

Result: The initial load might still take time, but subsequent refreshes are lightning-fast. This improves data freshness and reduces downtime. Your dataset can keep growing (each new month just becomes a new partition) without blowing up refresh times. As Microsoft notes, large models with potentially billions of rows can grow without needing to fully refresh the entire model each time.

Finally, always ensure that your data source and Power Query transformations support query folding for the incremental refresh to work efficiently. (We’ll verify that with Query Diagnostics in a moment.) If folding is broken, Power BI might end up fetching the full data each time – defeating the purpose. So incremental refresh and query folding often go hand-in-hand for optimal results.

6. Apply Compression and Encoding Optimization

Even after removing unused data, you should optimize how the remaining data is stored. Power BI’s VertiPaq engine compresses data heavily, but the degree of compression depends on data types and cardinality. Here are some techniques to squeeze your model size further:

Use Optimal Data Types: Prefer numeric columns over text when possible. VertiPaq has two encoding methods: value encoding (for numbers) is extremely efficient, whereas text uses hash encoding which is less so. Hash encoding stores a dictionary of unique values and pointers, which can bloat if there are many distinct values. For example, if you have an ID that looks like “SO123456” (text), consider splitting out the constant prefix “SO” and converting the rest to a number. In one case, converting a text order number to a numeric value yielded significant data reduction for a large table with many unique values. In our taxi data, we might convert pickup and dropoff zone IDs (if stored as text) to integer, or datetime to date + time numeric, etc., to improve compression. Consider splitting Datetime columns in two parts – date and time columns – not only because of the optimized data type but because this also reduce the cardinality of the columns, usually enough to make big difference.
Remove High-Cardinality Columns: Columns with a huge number of unique values (like transaction IDs, exact timestamps, or comments) are poison for compression. If they aren’t needed, dropping them will drastically reduce model size. If they are needed, see if they can be split or encoded (as above) to reduce cardinality. Even boolean or low-cardinality text can compress well (few unique values = better RLE compression). Use VertiPaq Analyzer (discussed below) to identify which columns consume the most memory – often you’ll find a few culprits you can address.
Disable Auto Date/Time: Power BI auto-generates hidden date tables for every date field by default. This convenience can bloat your model enormously if you have many date columns. Those hidden tables also consume memory and are counted in the .pbix size. It’s recommended to turn off the Auto Date/Time feature and instead use a single explicit Date dimension table. In a real case, an analyst found the auto date tables for a model’s date fields were using 76% of the entire model’s memory! Disabling this feature and using one optimized Date table freed up that space (the PBIX went from ~469 MB in memory to just ~113 MB), and also gave more control over date logic. Especially for large models, always disable auto date/time under Options > Data Load, and manage dates manually. You can do this for the current file and globally, as well. Remember that after applying this step PBI desktop should be restarted.
Compress at the Source (if possible): Sometimes you can reduce data size before it even reaches Power BI. Examples: rounding decimals to a set precision (e.g., 2 decimal places for currency) to reduce distinct values, or splitting a DateTime into date and time if you only need date granularity for analysis. This reduces cardinality and helps VertiPaq compress. Also remove any unnecessary text like descriptions or codes that you’re not visualizing.

By applying these compression-oriented optimizations, you ensure your model is as compact as it can be. A smaller model means faster loads and the ability to fit more data in memory if needed. Remember, VertiPaq can often compress data 10x from raw size, but that assumes you’ve given it compressor-friendly data. Optimizing data types and cardinalities can take you beyond that 10x, sometimes dramatically (as seen in the 99% size reduction anecdotes).

Keep Exploring: DAX Query Optimization Techniques for Faster Calculations in Power BI: Advanced Guide

7. Analyze the Power BI Data Model with VertiPaq Analyzer & Power Query Diagnostics

At this stage, you’ve applied several optimizations. How do you verify the impact and ensure everything is working efficiently? This is where analysis tools come in. We’ll use VertiPaq Analyzer to inspect model compression stats and Power Query Diagnostics to confirm query folding and query performance.

7.1 VertiPaq Analyzer – Inspecting Compression & Memory Usage

VertiPaq Analyzer is a tool (integrated in DAX Studio and other external tools) that provides details on the storage structure of your model. It shows table sizes, column cardinalities, dictionary sizes, and other metrics that tell you how VertiPaq has compressed your data . This is incredibly useful for pinpointing which tables or columns are consuming the most memory.

How to use VertiPaq Analyzer (via DAX Studio):

Download and install DAX Studio (if you haven’t already), a free external tool for Power BI.
In Power BI Desktop, with your model open, go to the External Tools ribbon and click DAX Studio. This launches DAX Studio connected to your Power BI model automatically.
In DAX Studio, go to the Advanced tab in the ribbon and click View Metrics (this runs VertiPaq Analyzer). DAX Studio will execute some analysis queries – you’ll see a status window as it gathers stats.
Once done, a VertiPaq Analyzer pane (or window) will show the results. Look at the Summary to see overall memory usage (it might show something like “Database size: X MB”). For example, our optimized taxi model might be, say, 300 MB in memory. (Note: The .pbix file size on disk is usually smaller; memory footprint when loaded can be ~3-4x the file size due to compression structures).
Click on the Tables view to see each table’s size. Identify the largest tables. Then click on Columns to see a breakdown of every column’s size, cardinality, etc. This is sorted by size by default, so you can easily spot the worst offenders.

Using the VertiPaq Analyzer output, verify the effects of your optimizations: Did removing columns drop those columns entirely? Is the fact table size reasonable given row count? Are any dimension tables surprisingly large (which could indicate something like auto date still sneaking in or high-card text)? In one example, the analyzer revealed that hidden date tables were occupying 76% of the model – confirming the need to disable auto date. In another, it might show that a single “Comment” column with high cardinality is using 50% of a table’s space, suggesting an opportunity to cut or split it.

Take note of the dictionary size vs. data size for columns. If a column’s dictionary is huge (meaning many unique values), that column is expensive. For instance, a Trip ID might have as many unique values as rows – a prime candidate for removal or perhaps hashing if absolutely needed. Also, check that your numeric columns are indeed stored as numeric (so they use value encoding). If you see a numeric-looking column stored as text (perhaps you forgot to change a data type in Power Query), you can fix that to improve compression.

By iteratively checking VertiPaq stats, you gain insight into how each model change affects size. It guides you to where further optimizations are possible.

Actionable insight: If VertiPaq Analyzer shows any large unused columns or tables, go back and remove them. If a dimension table like Date is huge, maybe you have an overly large date range or unnecessary columns on it. Use this feedback loop until the model is lean. Advanced users even save the metrics (DAX Studio allows exporting a VPAX file) and track it over time or between versions of the model.

7.2 Power Query Diagnostics – Ensuring Query Folding & Efficient Refresh

Next, we want to ensure that our queries (especially the fact table query with incremental refresh) are folding and performing well. Power Query Diagnostics is a built-in feature that logs all the queries and operations that occur during a data refresh or preview. It’s great for validating that query folding is happening – i.e., that filters are pushed to the source.

How to use Power Query Diagnostics:

In Power BI Desktop, Get Data -> Transform Data to open Power Query Editor.
On the Power Query Editor toolbar, go to the Tools tab and click Start Diagnostics. This begins recording any M queries that run.
Without closing the Power Query Editor, perform the operation you want to trace. For example, to simulate a refresh, you can go back to the main Power BI window (leave the PQ Editor open in the background), right-click your table in Fields pane and choose Refresh Data. This will trigger the query to the data source (for an incremental refresh setup, it will apply the RangeStart/RangeEnd filters for the current window). If you just want to test a preview, you could also refresh the preview in the editor, but note that preview may not always fold exactly like a full refresh. It’s best to simulate an actual refresh as described.
Once the refresh is done, switch back to Power Query Editor and click Stop Diagnostics.
Power Query will create two new query objects in your workbook: Diagnostics – Detailed and Diagnostics – Summary (in a group called Diagnostics). The detailed one contains a row for every operation and query that occurred.

Now, look at the Detailed diagnostics table. You’ll see columns like Query, Step, Data Source Query, Duration, etc. Focus on the Data Source Query column – this shows the actual native queries sent to your data source. Check if your filters and transformations were applied. For example, if incremental refresh is set up, you should see your fact table’s SQL (or query) with a WHERE clause on the date (e.g. WHERE Date >= ‘2020-01-01’ AND Date < ‘2020-02-01’ or similar). This confirms that Power BI folded the date range filter down to the source. If instead you see a query pulling the entire dataset (no where clause) or some evidence of Power BI doing filtering in memory, that’s a red flag – it means query folding was not achieved for that step.

Use the diagnostics info to troubleshoot: If folding isn’t happening, identify which step broke it. Common culprits include certain transformations (like merging queries, or using custom M functions) that aren’t foldable. You may need to adjust the Power Query steps (ensure the date filter is the last step before loading, etc.) to maintain folding. The diagnostics will also show how long each query took (Duration column), so you can see if any particular part of the refresh is slow. Perhaps one dimension table query is taking a long time – you might then optimize that or check its folding as well.

For our taxi dataset, we’d verify that the query to load taxi trips for, say, the last month is indeed only pulling that month’s data from the source. If we see that all years were pulled and then filtered, we know something’s wrong (and we’d fix the M query to push the filter). Power Query Diagnostics is essentially your truth serum – it shows exactly what’s happening under the covers, so you can ensure your model refresh is as efficient as intended.

(Advanced tip: You can also use SQL Profiler or the source’s query log to similar effect, but PQ Diagnostics is convenient and source-agnostic.)

By combining VertiPaq Analyzer and Power Query Diagnostics, an advanced Power BI modeler can both validate and iterate on their optimizations. These tools answer questions like: Did my column removal actually reduce size? Which columns still consume most memory? Is my incremental refresh truly pulling partial data? Are all my expensive transformations folding? This insight allows for data-driven optimization – you’re not guessing, you’re measuring.

8. Case Study: Optimizing a Large Power BI Data Model – Before & After

Let’s tie it all together with a real-world scenario. Suppose we have a Power BI dataset for a ride-sharing company with 100 million trip records over 10 years (similar to our NYC taxi example). The initial model was a straightforward import of all data with a few dimension tables. It worked, but barely – the .pbix file was huge, refreshes took over 2 hours, and some report visuals were timing out. Users complained that the dashboard was sluggish, especially when filtering large date ranges.

Challenges Faced (Before Optimization):

The dataset size on disk was ~1.2 GB; in-memory it occupied ~4 GB when loaded (pushing the limits of the capacity).
Auto date/time was left on, which unbeknownst to the developers added 10 hidden date tables, one for each date column, further inflating the model (these auto-generated tables accounted for more than half the model size).
Every daily refresh reloaded the entire 10 years of data from the data warehouse, often straining the gateway and frequently failing.
Key columns like “Ride ID” and exact “Timestamp” had high cardinality, contributing to poor compression.
All tables were in Import mode; as data grew, the model was nearing the Pro size limit and there were discussions about needing Power BI Premium just to handle it.

Optimization Steps Applied:

Removed Unused Columns/Tables: By reviewing the report requirements and using VertiPaq Analyzer, the team identified several large columns that were not used in any report or DAX measure (e.g. a free-form driver notes field). They removed these, cutting out millions of distinct text values and reducing model size by a significant margin. They also turned off Auto Date/Time and created one calendar table. This alone eliminated those 10 hidden date tables that were taking ~76% of the model’s memory – an enormous win. The model size plummeted from 4 GB to about 1.1 GB in memory after this step, simply by dropping the cruft.
Filtered to Relevant Data: Initially all 10 years of data were included, but analysis showed that reports rarely looked back more than 3 years. They implemented a Power Query date filter to only include the last 5 years in the model (retaining extra just in case some trend needed it, but dropping older history). This “horizontal filtering” removed ~30% of the rows, further shrinking the model to under 800 MB. No end-user noticed the difference, except that everything got faster.
Introduced Aggregations: They created monthly and daily aggregation tables for key metrics (trips count, revenue, etc.). These tables were a tiny fraction of the original fact rows. With proper aggregation settings, high-level visuals (like yearly trends, or total active drivers per month) began hitting the cache rather than the 100M-row table. The result was that those visuals that used to take 15-20 seconds to render now came back in less than a second – a 20x speedup in some cases, as the heavy lifting was eliminated. Meanwhile, detailed drill-through pages still worked via DirectQuery (see next step).
Moved Detail to DirectQuery + Dual dimensions: To keep the model size in check, they switched the fact table to DirectQuery and kept only the aggregation tables imported. Since the dimension tables (Date, Driver, Region) were not too large, they set them to Dual. This composite model approach allowed interactive analysis without consuming massive memory. Any detail query (like viewing all trips for a single day) would hit the source on demand. Thanks to aggregations, 95% of queries never needed to touch the source. The impact was substantial: the published dataset .pbix went from 1.2 GB down to ~250 MB. The in-memory working set on the Premium capacity was similarly small most of the time, since only aggs and dimensions were cached. Performance Analyzer showed most visuals running DAX queries under 1 second, whereas before many took 5-10+ seconds.
Enabled Incremental Refresh: The fact table (now DirectQuery for detail) was also enabled with an incremental refresh policy for the imported agg tables (which were partitioned by month). Only the latest partition (month) refreshes daily; the rest of the data remains untouched. This made the refresh process very fast and reliable. What used to be a 2+ hour refresh for the full data became a 5-minute refresh each morning for the latest increment. Resource consumption dropped so much that the team’s Power BI tenant stopped experiencing midday slowdowns (which were previously caused by the heavy refresh). The users got fresher data (daily instead of weekly refresh) with virtually no failures. The refresh logs confirmed only a few thousand new rows were pulled each day rather than re-pulling millions.

Outcomes (After Optimization):

90% Reduction in Model Size: From ~4 GB in memory down to ~400 MB active in memory (when a user is querying at aggregate level) – an order of magnitude smaller. Much of this came from eliminating unneeded data and using aggregations. One article famously achieved 99% reduction with a combination of such techniques, and this case was in that ballpark. Smaller size means the dataset can now comfortably fit in a Power BI Premium Per User workspace without issues.
10-20x Faster Report Performance: Key visuals that were sluggish are now instant. Overall report load time dropped from ~30 seconds to ~3 seconds on initial view. Interactive filtering has no noticeable lag. By caching the heavy queries (aggregations) and limiting what’s scanned, the user experience improved dramatically. As one blog noted, fewer columns/rows means less data to process and faster queries – our case proved that out.
Stable and Scalable Refresh: Incremental refresh turned the refresh process from a daily fire-fight to a non-event. The data team can extend the retention period easily or even move to real-time updates using a small DirectQuery portion if needed. The solution scales: as data grows, they’ll just get more partitions, but not a longer refresh. The Power Query Diagnostics confirmed query folding is working (each partition refresh only queries that partition’s range, not the whole set), so the architecture is solid for the future.
Positive User Feedback: The business users noticed the improvements – no more complaints about slowness; instead, they asked for more complex analysis now that the model could handle it. The success of this optimization effort increased confidence in using Power BI for large-scale data. The team avoided an expensive capacity upgrade by optimizing at the model level.

Conclusion of case study: This example shows that even if your dataset is “large”, smart modeling techniques can make it performant. Removing unnecessary data, leveraging aggregations, splitting storage modes, and using incremental refresh transformed a struggling Power BI report into a snappy, scalable solution. The key takeaway is that model optimization is often the difference-maker – turning Power BI from barely handling a dataset to excelling with it. Advanced users should regularly apply these practices when dealing with big data models.

Power BI Data Models Optimization for Large Datasets: Next Steps

Optimizing Power BI data models for large datasets is both an art and a science. As an advanced Power BI professional, you have an array of techniques at your disposal – from trimming the fat (unused columns/rows) and tuning storage modes, to leveraging aggregations, incremental refresh, and compression tricks. Always measure the impact using tools like VertiPaq Analyzer for memory stats and Query Diagnostics for query behavior, so you can continuously refine your approach. By implementing the strategies in this guide, you’ll be able to handle much larger datasets within Power BI while delivering fast, reliable reports to your end users.

Remember, performance optimization is an iterative process. Keep an eye on your model as data grows or requirements change, and adjust accordingly. With diligent optimization, it’s possible to reduce a model’s size by 90%+ and make reports 10x faster, as we demonstrated.

Have Power BI Questions?

Let’s talk!

Ask an expert at +1 888 564 1235 (for US) or +359 2 493 0393 (for Europe) or fill in our form below to tell us more about your project.

Services

Data Analytics & BI

Data Management & Cloud

AI & Machine Learning

Enterprise Performance Management

Support & Enablement

Solutions

Enterprise Planning & Forecasting

Supply Chain Planning & Optimization

AI & Generative AI Solutions (B EYE Labs)

Data Integration & Advanced Analytics

How to Optimize Power BI Data Models for Large Datasets [Advanced Step-by-Step Guide]

1. Use a Large Public Dataset for Testing

2. Reduce Model Size by Removing Unused Data

3. Choose the Right Storage Mode (Import vs. DirectQuery vs. Dual)

4. Implement Aggregations for Better Query Performance

5. Use Incremental Refresh for Large Datasets

6. Apply Compression and Encoding Optimization

7. Analyze the Power BI Data Model with VertiPaq Analyzer & Power Query Diagnostics

7.1 VertiPaq Analyzer – Inspecting Compression & Memory Usage

7.2 Power Query Diagnostics – Ensuring Query Folding & Efficient Refresh

8. Case Study: Optimizing a Large Power BI Data Model – Before & After

Power BI Data Models Optimization for Large Datasets: Next Steps

Simplify and Automate Corporate Incentives with Anaplan

Build a Robust AI Data Strategy: Readiness Assessment and Implementation Framework

Efficient Data Visualization: Tools and Tactics for Finance Teams

Discover the
B EYE Standard

Related Articles

Retail Analytics Consulting: Strategic Implementation Guide

Predictive Analytics Services: Strategic Implementation Guide

How To Reduce Late Shipments with Data in 3 Steps

How to Boost Efficiency in Manufacturing Data Analytics

About Us

USA

Bulgaria

Services

Data Analytics & BI

Data Management & Cloud

AI & Machine Learning

Enterprise Performance Management

Support & Enablement

Solutions

Enterprise Planning & Forecasting

Supply Chain Planning & Optimization

AI & Generative AI Solutions (B EYE Labs)

Data Integration & Advanced Analytics

1. Use a Large Public Dataset for Testing

2. Reduce Model Size by Removing Unused Data

3. Choose the Right Storage Mode (Import vs. DirectQuery vs. Dual)

4. Implement Aggregations for Better Query Performance

5. Use Incremental Refresh for Large Datasets

6. Apply Compression and Encoding Optimization

7. Analyze the Power BI Data Model with VertiPaq Analyzer & Power Query Diagnostics

7.1 VertiPaq Analyzer – Inspecting Compression & Memory Usage

7.2 Power Query Diagnostics – Ensuring Query Folding & Efficient Refresh

8. Case Study: Optimizing a Large Power BI Data Model – Before & After

Power BI Data Models Optimization for Large Datasets: Next Steps

Discover the B EYE Standard

Related Articles

Retail Analytics Consulting: Strategic Implementation Guide

Predictive Analytics Services: Strategic Implementation Guide

How To Reduce Late Shipments with Data in 3 Steps

How to Boost Efficiency in Manufacturing Data Analytics

About Us

USA

Bulgaria

Discover the
B EYE Standard