Prasad.dev
All posts
·30 min read

The Timestamp Trap in Power BI Incremental Refresh: Why Correct Configuration Can Still Produce Stale Data

Custom pollingExpression is documented. The real problem is choosing a change signal that represents business truth, not ETL noise.

power-bi
fabric
incremental-refresh
data-engineering

Why this article exists

Custom polling expressions in Power BI incremental refresh have been documented since 2022. Chris Webb has a worked example. Microsoft Learn covers the XMLA configuration. Tabular Editor exposes pollingExpression as a refresh-policy property.

This article isn't about the mechanism. It's about a production failure mode that the documentation doesn't surface clearly: the wrong change signal makes incremental refresh either silently stale or unnecessarily expensive — regardless of how correctly you configure the mechanism.

The refresh succeeds. No errors. No warnings. The semantic model just stops reflecting the source. Or, equally bad, every refresh re-imports every partition because warehouse housekeeping keeps tripping the polling signal.

If you've already read Chris Webb's post and Microsoft's docs, this article still has something for you. The focus here is on three production patterns that the existing literature underweights:

  1. The taxonomy of change-signal types (creation timestamp, warehouse audit timestamp, business modification timestamp) and which fails in which way.
  2. The Power BI Desktop UI behavior that can silently overwrite a custom polling expression after deployment.
  3. A pattern for detecting soft-deletes that doesn't reintroduce warehouse housekeeping noise.

If you haven't read the prior art, the article is also self-contained.


TL;DR

Power BI incremental refresh can be technically configured correctly and still return stale data.

The common failure mode is polling on the wrong timestamp:

  • CreatedDate misses edits to old rows — the MAX doesn't move when existing rows are edited.
  • _dw_updated_at catches edits but also catches ETL backfills, migrations, and reprocessing noise — defeating incremental refresh.
  • A source-owned business modification timestamp is the cleanest signal.
  • A custom pollingExpression lets you query that signal directly from the warehouse without importing the column into VertiPaq — only the scalar result is stored, as the partition's refreshBookmark.
  • Soft deletes need a separate conditional pattern. Hard deletes need a tombstone table or CDC.
  • The Power BI Desktop incremental refresh dialog can overwrite a custom polling expression on Apply. Source-control the TMDL and treat the Desktop dialog as unsafe.

If you only want the implementation pattern, jump to sections 6, 8, and 11. The rest is signal-design reasoning and operational gotchas.


Scope and applicability

This article applies to Import-mode semantic models on warehouse sources (SQL Server, Synapse, Snowflake, Databricks, BigQuery, Postgres, etc.) using Power BI's incremental refresh policy.

It does not apply to:

  • Direct Lake models over Fabric Delta tables. Direct Lake uses a different freshness and framing mechanism and does not use Import-mode incremental refresh policies or pollingExpression.
  • DirectQuery models, which read from the source on every query and have no refresh policy concept.
  • Hybrid models with a real-time DirectQuery partition — the incremental partitions still use pollingExpression, but the DirectQuery partition has its own freshness model.

Custom polling expression editing requires either a .pbip project (Power BI Project file format) for TMDL editing, or an XMLA endpoint connection (Premium / Premium-per-User / Fabric capacity) for Tabular Editor's live editing workflow.


Table of contents

  1. Background — how incremental refresh actually works
  2. The default polling pattern and what it misses
  3. The change-signal taxonomy
  4. Why warehouse audit columns over-refresh
  5. What a clean business modification timestamp looks like
  6. The solution — polling without importing the column
  7. Why this works — pollingExpression is metadata, the column is not
  8. Where to make the change — TMDL, Tabular Editor, XMLA
  9. Pre-deploy and post-deploy checklist
  10. The Power BI Desktop UI footgun
  11. Handling soft deletes without re-introducing noise
  12. Edge cases — late-arriving facts, query folding, NULL handling
  13. What changes after deploying this
  14. Transferable pattern

1. Background — how incremental refresh actually works

Power BI incremental refresh splits a large table into time-based partitions and refreshes only the partitions whose contents have changed, rather than reloading everything on every schedule.

Two expressions in the refresh policy drive the behavior:

  • sourceExpression — Power Query M code that loads data for a given partition. Returns a table. The result populates VertiPaq storage and is queryable by every measure and visual.
  • pollingExpression — Power Query M code that decides whether a partition needs to be re-imported at all. Returns a single scalar value. Power BI stores this scalar as the partition's refreshBookmark. On the next refresh, the new scalar is compared against the stored bookmark. Same value → skip the partition. Different → re-run sourceExpression and update the bookmark.

A quick preview of the distinction that becomes the central point of section 7:

sourceExpressionpollingExpression
ReturnsA table (becomes model data)A single scalar (becomes partition metadata)
Storage cost in VertiPaqLinear with row countNone — only a scalar bookmark per partition in model metadata

The polling step is what makes incremental refresh fast at scale. Without it, every refresh would have to either re-import every partition (defeating the point) or trust some external signal that the data hasn't changed (fragile).

This article is about pollingExpression specifically — what to put in it, and why most defaults are wrong.


2. The default polling pattern and what it misses

When you toggle "Detect data changes" in Power BI Desktop's incremental refresh dialog, the UI auto-generates a pollingExpression that reads from a column you pick from a dropdown:

pollingExpression = let
    MaxDateTime = List.Max(#"FactTable"[CreatedDate]),
    accountForNull = if MaxDateTime = null
                     then #datetime(1901, 1, 1, 0, 0, 0)
                     else MaxDateTime
in accountForNull

The pattern: take the model column, find its MAX within the partition, return that scalar. If MAX hasn't moved since last refresh, the partition is treated as stable.

This works correctly for insert-only tables. New rows increase the MAX. Polling sees the change. Refresh fires.

For tables where rows can be edited in-place after creation, this pattern fails — but not in the sense of a Power BI bug. The refresh succeeds. Power BI does exactly what it was configured to do. The chosen signal is semantically wrong: a creation timestamp does not move when an existing row is edited, so the MAX does not move, so the partition is skipped, so the edit never lands in the model.

The semantic model becomes progressively stale relative to the source. No error surfaces. No anomaly appears in refresh history. The report just shows yesterday's values for rows that were edited today, last week, last year. Active partitions (recent months that get refreshed every cycle) stay current; everything older drifts.

This is the "timestamp trap" — and it's a signal-design problem, not a configuration problem.


3. The change-signal taxonomy

When you're picking a column for change detection, every candidate falls into one of four categories. Three of them have well-defined failure modes.

Signal typeExample column namesCatches edits?Catches deletes?Tripped by ETL noise?
Creation timestampCreatedDate, OrderDate, EventTime
Warehouse audit timestamp_dw_updated_at, LastModified, etl_loaded_at✅ (if soft delete sets it)✅ (usually heavily)
Business modification timestampLastEditedAt, RecordUpdatedAt, BusinessModifiedDate❌ (deletes don't move it)
Composite signal (conditional)Custom expression✅ (with care)❌ (with care)

The default Power BI Desktop UI workflow nudges you toward a creation timestamp because that's typically the column you used for RangeStart/RangeEnd partitioning, and it's the most obvious "date" column on the table. For insert-only data, this is correct. For mutable data, it's the timestamp trap.

The instinct after discovering the trap is to switch to a warehouse audit column — and that fails for a different reason.


4. Why warehouse audit columns over-refresh

A warehouse audit column is updated by the warehouse's ETL pipeline, not by the business application of record. It changes when:

  • A new row is inserted (business signal — correct)
  • An application edits the underlying record (business signal — correct)
  • A soft-delete flag is flipped (business signal — correct)
  • A migration or one-time backfill runs (noise)
  • A schema change touches every row (noise)
  • An ETL pipeline reprocesses a partition (noise)
  • A deduplication or SCD rebuild runs (noise)
  • A type-2 dimension's surrogate keys get renumbered (noise)

If your DBA runs UPDATE FactTable SET _dw_updated_at = GETDATE() to fix a structural issue overnight, Power BI sees every partition as changed and re-refreshes the entire historical window on the next cycle. The whole point of incremental refresh — skipping stable partitions — is defeated.

This isn't to say warehouse audit columns are universally wrong. Some warehouses maintain a strict ETL contract guaranteeing that the audit column only changes when the underlying business entity changes. In those environments, polling on the audit column is fine. But that's a strong contract to assume by default, and most warehouse ETL pipelines do not honor it.

The defensible rule: a warehouse processing timestamp is usually the wrong polling signal unless your ETL contract guarantees it only changes when the business entity changes.


5. What a clean business modification timestamp looks like

The reliable signal is a column maintained by the source application of record, set when the business meaning of a row changes. Common names: LastEditedAt, RecordUpdatedAt, LastModifiedByUser, BusinessModifiedDate. The key property: it moves when business changes happen, not when ETL touches the row.

For this column to be reliable as a polling signal, the source contract needs to guarantee: it's updated on every business-relevant write (inserts, updates, and soft-delete flag flips), timezone-consistent (mixed UTC/local produces silent comparison errors), and populated for the full row history (recently-added columns with null backfills break the MAX calculation). Second-level precision is usually enough.

If your source schema doesn't have such a column, the right answer is to get it added at the source — there is no purely warehouse-side workaround that produces correct semantics for mutable data. You can sometimes derive it (from a CDC stream or row-versioning column), but the cleanest path is a true business modification timestamp owned by the source application.

A common worry when first considering this fix: "our business modification rate is very low — under 1% of rows. Is that enough signal?"

Yes. Polling fires on MAX changing, not on a percentage. When even one row in a partition gets a new modification timestamp, that partition's MAX moves. The polling mechanism catches whatever changes, no matter how rare. The percentage tells you the data is stable; it doesn't affect whether polling works.


6. The solution — polling without importing the column

The standard pollingExpression pattern reads a model column:

List.Max(#"FactTable"[LastEditedAt])

That requires the column to be imported into the model. For a high-cardinality timestamp column on a large fact table, that can easily cost hundreds of megabytes for a column no measure, visual, or relationship actually uses.

Power BI's M language lets pollingExpression query the source directly, not just a column already in the model. The fix is to query the source for the audit column's MAX, scoped to the partition, at refresh time.

Here's a concrete example against SQL Server using parameterized native query:

let
    Source = Sql.Database("server-name", "database-name"),
    Result = Value.NativeQuery(
        Source,
        "
        SELECT MAX(
            CASE
                WHEN LastEditedAt IS NULL THEN CreatedDate
                WHEN LastEditedAt > CreatedDate THEN LastEditedAt
                ELSE CreatedDate
            END
        ) AS MaxModified
        FROM dbo.FactTable
        WHERE CreatedDate >= @RangeStart
          AND CreatedDate < @RangeEnd
        ",
        [RangeStart = RangeStart, RangeEnd = RangeEnd]
    ),
    MaxModified = Result{0}[MaxModified],
    accountForNull = if MaxModified = null
                     then #datetime(1901, 1, 1, 0, 0, 0)
                     else MaxModified
in
    accountForNull

The same pattern adapts to other sources:

  • Snowflake: swap Sql.Database for Snowflake.Databases, use GREATEST(CreatedDate, COALESCE(LastEditedAt, CreatedDate)) instead of the CASE expression (Snowflake's GREATEST is null-safe in this form via the COALESCE).
  • Databricks: use Databricks.Catalogs and Spark SQL syntax.
  • BigQuery: use GoogleBigQuery.Database and GREATEST with IFNULL for null safety.
  • Postgres: use PostgreSQL.Database and GREATEST with COALESCE.

Why the CASE expression instead of GREATEST for the SQL Server example: in T-SQL, GREATEST was only added in SQL Server 2022. The CASE form works on all versions and is also null-safe by design. GREATEST NULL-handling varies by SQL dialect. Snowflake, BigQuery, and MySQL return NULL if any argument is NULL. SQL Server 2022+ and PostgreSQL ignore NULL arguments unless all are NULL. Because the behavior is engine-specific, use explicit COALESCE or a portable CASE expression — both work everywhere.

Key mechanics:

  • RangeStart and RangeEnd are Power BI's built-in partition-scoped parameters. The polling query is automatically filtered to one partition at a time, just like sourceExpression is.
  • Parameterized native query is preferred over string-concatenated SQL. Value.NativeQuery with a parameter table is safer (no formatting/timezone surprises in the string conversion) and folds cleanly to the source. For connectors that don't support parameterized native queries, fall back to careful string concatenation with DateTime.ToText — but verify the timezone conversion behavior.
  • The CASE (or GREATEST + null-handling) expression picks whichever of creation or modification timestamp is more recent. Catches new inserts (creation moves) and edits (modified moves). Always wrap the modification column with explicit null handling — older rows where the column was null before being backfilled are common.
  • The business modification column is not imported into the model. The polling expression queries the source live at refresh time. The column never enters VertiPaq.
  • accountForNull is critical: if the polling query returns null (empty partition, permissions issue, schema drift), Power BI compares null to the previous bookmark. The fallback #datetime(1901, 1, 1, 0, 0, 0) ensures a deterministic value rather than a comparison that might silently match a prior null.

One constraint to be aware of: the polling query and the partition's sourceExpression must use the same Power BI data source identity. If your audit signal lives in a different catalog or database, expose it through a view or query path that uses the same configured connection.


7. Why this works — pollingExpression is metadata, the column is not

This is the framing that makes the whole pattern click, and the one most easily missed.

sourceExpression and pollingExpression are both written in Power Query M, but they serve different purposes:

sourceExpressionpollingExpression
ReturnsA tableA single scalar value
Result is stored inVertiPaq (the model's columnar store)Partition metadata as a refreshBookmark
Visible to DAX measures and visuals?YesNo — internal only
Storage cost in the datasetLinear with row count and cardinalityNo VertiPaq column storage — only a small scalar bookmark per partition in BIM/TMDL metadata
Runs whenOnly for partitions polling says are "changed"Every partition, every refresh

When pollingExpression runs, Power BI takes the scalar it returns and writes it onto the partition's refreshBookmark — exposed in the Tabular Object Model (TOM) as the RefreshBookmark property on the partition (older docs and tools may refer to it as PollingPosition). On the next refresh, the new scalar is compared against the stored bookmark. If they match, the partition is left alone. If they differ, sourceExpression runs and the bookmark is updated to the new scalar.

The source column never enters VertiPaq. Only the scalar result is persisted, as a per-partition bookmark in the model's metadata. The storage cost is the size of a few timestamps, not the size of a column with millions of values.

This is what makes the "query the source without importing the column" pattern work. The column is read live at refresh time, the MAX is computed in the source (assuming query folding works — covered in section 12), one timestamp comes back over the wire, gets stamped onto the partition's metadata, and is compared on the next refresh. Nothing accumulates in VertiPaq. The model stays exactly the size it was before.

The runtime cost is small: at refresh time, you fire one tiny scalar query per partition. With proper indexing or partition pruning on the source, that's typically a small fraction of total refresh time.


8. Where to make the change — TMDL, Tabular Editor, XMLA

The Power BI Desktop UI cannot express this pattern. The "Incremental refresh" dialog's "Detect data changes" toggle only lets you pick a model column from a dropdown — no custom expressions, no source-direct queries.

Three ways to configure it:

Option 1 — Edit the TMDL file directly (.pbip project)

In a Power BI Project file (.pbip), the model is stored as TMDL files on disk. The fact table's refresh policy lives at:

<pbip-folder>/<Model Name>.SemanticModel/definition/tables/<table>.tmdl

Inside that file, the refreshPolicy block sits near the top of the table definition. Edit it directly:

Note: the triple-backticks inside the code block below are intentional — TMDL uses them as multi-line string delimiters around the M expression.

table FactTable
 
    sourceLineageTag: [dbo].[FactTable]
    
    refreshPolicy: basicAutoIncrementalPolicy
        incrementalGranularity: month
        incrementalPeriod: 12
        rollingWindowGranularity: month
        rollingWindowPeriod: 36
        pollingExpression = ```
            let
                Source = Sql.Database("server-name", "database-name"),
                Result = Value.NativeQuery(
                    Source,
                    "
                    SELECT MAX(
                        CASE
                            WHEN LastEditedAt IS NULL THEN CreatedDate
                            WHEN LastEditedAt > CreatedDate THEN LastEditedAt
                            ELSE CreatedDate
                        END
                    ) AS MaxModified
                    FROM dbo.FactTable
                    WHERE CreatedDate >= @RangeStart
                      AND CreatedDate < @RangeEnd
                    ",
                    [RangeStart = RangeStart, RangeEnd = RangeEnd]
                ),
                MaxModified = Result{0}[MaxModified],
                accountForNull = if MaxModified = null
                                 then #datetime(1901, 1, 1, 0, 0, 0)
                                 else MaxModified
            in
                accountForNull
            ```
 
    column CreatedDate
        ...

The triple-backticks are TMDL syntax for multi-line string property values. The M code itself is what sits between them. Power BI Desktop picks up TMDL changes the next time the project is opened.

This is the path of choice for Git-based workflows where the refresh policy needs to be version-controlled and code-reviewed alongside other model changes.

Option 2 — Tabular Editor 2 (free, recommended for live model editing)

Tabular Editor 2 connects to a running Power BI Desktop instance (via diagnostic port) or to an XMLA endpoint in Service. It exposes the refresh policy as editable properties: Tables → <fact table> → Refresh Policy → Polling Expression.

Paste the custom expression, save, and the change is applied to the live model. This is the standard workflow for Power BI engineers managing custom refresh policies without the Git overhead.

Option 3 — XMLA / TMSL scripting (Premium / PPU / Fabric)

For models hosted in Premium, Premium-per-User, or Fabric capacities with the XMLA endpoint enabled in read/write mode, you can push TMSL scripts that modify the refresh policy programmatically. Useful in CI/CD pipelines:

{
  "createOrReplace": {
    "object": { "database": "...", "table": "FactTable", "refreshPolicy": {} },
    "refreshPolicy": {
      "policyType": "basic",
      "pollingExpression": "let ... in accountForNull",
      ...
    }
  }
}

Overkill for a one-off change, but the right answer if you're managing tens or hundreds of models programmatically.

A licensing note: incremental refresh itself is supported on Pro, Premium-per-User, Premium, and Fabric SKUs. Advanced refresh-policy editing via XMLA requires Premium / Premium-per-User / Fabric capacity. TMDL editing via .pbip works on any tier.


9. Pre-deploy and post-deploy checklist

Five things that bite teams deploying custom polling for the first time:

1. Gateway / connection credentials

The custom pollingExpression queries the source directly at refresh time. In Power BI Service, refreshes run under the dataset's saved credential (gateway service account or service principal), not under your interactive identity. That account needs SELECT permission on the table referenced in the polling query — which may differ from the table the sourceExpression loads if you're polling a related audit table.

Test before publishing: have the gateway account run the polling SQL directly against the source. If it fails, the polling failure will surface in refresh history, but it's easier to catch upfront.

2. Same data source identity

The polling query and the partition's source must share the same Power BI data source identity. If the audit signal lives in a different catalog or database, expose it through a view or query path that uses the same configured connection. Cross-server queries through a different data source connection are not supported by the refresh policy.

3. First-refresh behavior

When bookmarks do not yet exist, Power BI has to process partitions to seed them. In a new deployment, expect the first refresh to process the full incremental window. Subsequent refreshes are when you'll see partitions starting to skip.

Verify on the second refresh: most partitions should show "No data changes detected" in refresh history rather than "Refreshed." That's the polling mechanism working correctly.

4. Post-deploy refresh bookmark validation

After the first refresh, inspect the partition metadata through XMLA (DAX Studio, SSMS, or Tabular Editor) and confirm the bookmark/polling-position value is populated for each partition. The Tabular Object Model exposes this as the RefreshBookmark property on the partition.

As one example of how to query this via DMV (column names and exposure vary by tool version, so verify on your stack):

SELECT [Name], [RefreshBookmark]
FROM $SYSTEM.TMSCHEMA_PARTITIONS
WHERE [TableID] = <your_table_id>

Confirm that bookmarks are populated and look like the timestamps your polling query is returning. Empty or null bookmarks indicate the polling query failed silently or returned null.

5. The Desktop UI overwrite risk (see section 10)

Critical enough that it gets its own section.


10. The Power BI Desktop UI footgun

After a custom polling expression is deployed, the Power BI Desktop incremental refresh dialog cannot represent it. The dropdown only knows how to display model columns, and a custom polling expression isn't a column reference.

The footgun: opening the dialog after a custom expression is deployed appears to be a read-only inspection, but isn't. The dialog reads the current state, displays a blank or default-selected dropdown (because it can't parse the custom M), and on Apply, can write back its dropdown selection as the new polling expression — overwriting your custom one with a regenerated column reference.

The exact behavior depends on Power BI Desktop version and whether other refresh-policy fields are modified. In practice, treat the dialog as unsafe to even open on models with custom polling expressions, until you've verified the behavior on your specific version.

A safe reproduction test (run this once on a non-production model):

  1. Deploy a .pbip with a custom pollingExpression in TMDL.
  2. Commit the TMDL state. Note the pollingExpression block content.
  3. Open Power BI Desktop. Open the table's incremental refresh dialog.
  4. Without changing anything, click Apply (or click OK to close).
  5. Save the .pbip and git diff the TMDL.

If the pollingExpression block changed, you've reproduced the footgun and now know to keep the dialog closed. If it didn't, your version may be safe — but verify after every Desktop upgrade.

Mitigations regardless of reproduction:

  • Source-control the TMDL. .pbip + Git means any silent overwrite shows up in the next git diff. This is one of the strongest arguments for .pbip over .pbix on any model with custom refresh policies.
  • Use Tabular Editor for refresh-policy changes. It reads the current state correctly and doesn't auto-generate replacement expressions.
  • Republishing PBIX is also risky. Microsoft explicitly warns that republishing from Desktop can overwrite the entire incremental refresh policy and force a full refresh. For service-side custom polling expressions, prefer metadata-only deployments via ALM Toolkit or XMLA scripting.

The general principle: once you've deployed a custom polling expression, the model's refresh policy lives in your source-controlled TMDL or in the deployed metadata, not in whatever the Desktop dialog wants to regenerate. Treat any Desktop session on such a model as potentially destructive until proven otherwise.


11. Handling soft deletes without re-introducing noise

The polling pattern in section 6 catches inserts and edits. It does not catch deletes — a business modification timestamp does not move when a row is removed.

For sources that implement soft deletes (marking a row with an IsDeleted flag rather than removing it physically), you can detect deletions by combining the business modification timestamp with a conditional check on the deletion flag. A critical gotcha to get right first: in some SQL dialects (Snowflake, BigQuery, MySQL), GREATEST returns NULL if any argument is NULL — though SQL Server 2022+ and PostgreSQL ignore NULL arguments instead. Because the behavior is engine-specific, the safest pattern is explicit COALESCE or a portable CASE expression. A naive CASE WHEN IsDeleted = 1 THEN audit_ts END evaluates to NULL for every non-deleted row, which collapses the entire GREATEST to NULL and forces a full refresh every cycle.

The fix is to give the CASE an ELSE branch that falls back to a non-null timestamp:

SELECT MAX(GREATEST(
    CreatedDate,
    COALESCE(LastEditedAt, CreatedDate),
    CASE WHEN IsDeleted = 1
         THEN WarehouseModifiedAt
         ELSE CreatedDate
    END
)) AS MaxModified
FROM dbo.FactTable
WHERE CreatedDate >= @RangeStart
  AND CreatedDate < @RangeEnd

The CASE clause returns WarehouseModifiedAt for soft-deleted rows (where the flag flip set the audit column) and either CreatedDate or COALESCE(LastEditedAt, CreatedDate) for non-deleted rows. New soft-deletes move the MAX. Warehouse housekeeping noise on non-deleted rows is excluded from the MAX. The result: insert + edit + soft-delete detection without re-introducing the noise problem from section 4.

For SQL Server versions before 2022 (no GREATEST), or for defensive coding, the nested-CASE form is more portable:

SELECT MAX(
    CASE
        WHEN IsDeleted = 1 THEN WarehouseModifiedAt
        WHEN LastEditedAt IS NOT NULL AND LastEditedAt > CreatedDate THEN LastEditedAt
        ELSE CreatedDate
    END
) AS MaxModified
FROM dbo.FactTable
WHERE CreatedDate >= @RangeStart
  AND CreatedDate < @RangeEnd

Same semantics, no GREATEST dependency, explicit null handling at every branch.

Important caveats — this pattern works only if all of the following hold:

  1. Deleted rows remain visible to the polling query. If your source system or warehouse purges soft-deleted rows after a retention window, the pattern stops detecting deletes once the row is gone.
  2. The deletion flag is set atomically with the warehouse audit timestamp. Some pipelines update the flag without touching the audit column, or vice versa. Verify the contract on your source.
  3. The partition's sourceExpression must cooperate. Detecting that a partition has soft-deletes only matters if your sourceExpression is also filtering them out. Otherwise the polling triggers a refresh that re-imports the same deleted rows, marked as deleted but still present in the model. Make sure your partition load includes a WHERE IsDeleted = 0 (or equivalent) clause.
  4. Hard deletes are not covered. If a row is physically removed from the source, no polling expression can detect it — the row simply isn't there to be measured. Hard deletes require either a separate tombstone table, CDC stream, or scheduled historical-partition full-refresh.
  5. Late-arriving deletes still need partition visibility. If a row created in 2023 is soft-deleted today, the deletion needs to be detected by polling the 2023 partition. The pattern works for this case, but only if the soft-deleted row remains in the source long enough to be polled.

For sources without soft deletes (rows are physically removed), the practical options are:

  • Add a tombstone or CDC mechanism at the source.
  • Periodically full-refresh historical partitions on a schedule (defeats some of the incremental benefit but is sometimes the only option).
  • Switch to a model architecture (Direct Lake, DirectQuery) where the question doesn't apply.

This pattern is best framed as soft-delete or tombstone-aware change detection, not as universal delete handling.


12. Edge cases — late-arriving facts, query folding, NULL handling

Three issues that don't fit cleanly into a numbered checklist but matter in production:

Late-arriving facts and partition-key mismatch. A common real-world pattern: a row is created today but represents a business event from last year. The partition the row belongs to (based on RangeStart/RangeEnd filtering on the creation timestamp) is the current month, so the row lands in the active partition correctly. But if the business event timestamp doesn't match the creation timestamp — for example, the row's "transaction date" is from 2023 while its "created in warehouse date" is today — your partitioning column and your reporting column diverge.

This isn't a pollingExpression problem per se; it's a partition filter choice. If your IR filter is on a creation/load timestamp, late-arriving rows land in the current partition and are detected by polling — they appear in reports correctly because reports filter on the business date, not on partition assignment. If your IR filter is on the business event date instead, a row inserted today with a 2023 business date belongs in the 2023 partition — which is typically outside the active refresh window, so even if polling triggered, the partition wouldn't refresh. This is a model design decision to make before configuring incremental refresh, not something pollingExpression can fix.

Query folding for the polling expression. Incremental refresh depends on the partition's sourceExpression folding cleanly to the source — if it doesn't, Power BI pulls all rows and filters locally, defeating the purpose. The same constraint applies to pollingExpression: if your polling M code doesn't fold, Power BI may execute the MAX calculation locally after pulling the underlying column, which silently re-introduces the storage cost you were trying to avoid.

Two practical safeguards:

  • Use a native SQL query inside the polling expression (as in section 6) rather than chained M transformations. Native queries bypass folding ambiguity at the cost of being source-specific.
  • Verify the generated query. For SQL-based sources, Power Query's query plan diagnostics (or the source's query log) can confirm that the polling MAX is being computed source-side.

If you go the native SQL route, watch for gateway timeout limits on long-running polling queries. The polling query runs once per partition; a model with 60 monthly partitions runs 60 polling queries per refresh. If any one of them runs slow, refresh history will surface the timeout.

NULL handling. The accountForNull pattern from section 6 isn't optional. Scenarios where the polling query can legitimately return null:

  • An empty partition (no rows in the date range yet)
  • A defensive view or wrapper procedure that returns no rows instead of failing
  • A schema drift or permissions issue that the connector swallows silently
  • A query parsing issue (e.g., column renamed at the source) that returns null without an error

If the polling query returns null and your expression returns null, Power BI compares null to the previous bookmark (which might also be null after the first refresh on an empty partition) and may treat the partition as unchanged.

Use the deterministic #datetime(1901, 1, 1, 0, 0, 0) fallback. This produces a stable, comparable value across refreshes. The first time a partition gets real data, its bookmark moves from the 1901 sentinel to the actual MAX, triggering a single refresh. From then on, polling works normally.

Do not use DateTime.LocalNow() as a fallback. It's tempting to think "if the polling fails, force a refresh next cycle." But DateTime.LocalNow() changes every refresh execution. If your source has a real outage, every partition's polling query returns null, every partition's fallback evaluates to a new DateTime.LocalNow(), and Power BI attempts a full re-import of every historical partition on the next refresh — usually against the same struggling source. This pattern has crashed gateway clusters and is not worth the marginal "alerting" benefit. Stick with the 1901 baseline.


13. What changes after deploying this

  • Late edits to historical partitions are caught on the next refresh cycle instead of being silently missed
  • The polling column is not imported into the model — zero VertiPaq cost
  • The model's metadata grows by one scalar bookmark per partition (negligible — a few bytes per partition)
  • Refresh time increases by one scalar query per partition (typically sub-second per query against an indexed source)
  • Partitions that haven't changed continue to skip correctly — incremental refresh's performance benefit is preserved
  • Refresh history becomes meaningful: "Refreshed" entries reflect actual business-side changes, not warehouse housekeeping noise
  • The TMDL refresh-policy block becomes a source-controlled artifact that can be diffed, code-reviewed, and rolled forward through environments

The fix is a single TMDL edit. Roughly 15 lines of M code replace a single-line expression. Zero storage impact. Catches a class of bugs that the default configuration cannot surface or warn about.


14. Transferable pattern

When configuring Power BI incremental refresh against any warehouse-backed table, the questions to answer, in order:

  1. Can a row's content change after creation? If no, the default polling on a creation timestamp is correct. Stop here.
  2. Is there a business-side modification timestamp owned by the source application? If no, get one added at the source. There is no purely warehouse-side workaround that produces correct semantics for mutable data.
  3. Does the business modification timestamp meet the conditions in section 5? (Updated on every business write, consistent timezone, sufficient precision, defined null handling, populated for full row history.) If not, fix those at the source before relying on the column.
  4. Are deletes a concern? If yes, identify whether your source uses soft deletes (use the pattern in section 11) or hard deletes (use a tombstone table, CDC, or scheduled historical refresh).
  5. Configure the polling expression as a direct source query rather than a model-column reference. This keeps the polling column out of VertiPaq.
  6. Verify gateway permissions, query folding, and first-refresh behavior in a test workspace before deploying to production.
  7. Source-control the refresh policy. Use .pbip format and Git. Diff the TMDL on every change. Treat the Power BI Desktop incremental refresh dialog as unsafe on models with custom polling expressions.

Most reports don't need this level of care. For reports driving operational decisions or showing data that gets corrected over time — finance, claims, compliance, regulated systems, anything with a source-of-truth application behind it — it's the difference between accurate reporting and silent staleness.


Closing thoughts

Power BI's pollingExpression is more capable than the Power BI Desktop UI implies. Most teams configure incremental refresh once through the dropdown and never revisit it, never realizing that the same hook can run arbitrary M against any source, return any scalar, and detect any kind of change — without paying VertiPaq storage for the underlying signal.

But the mechanism is the easy part. The harder part — and the part most articles underweight — is choosing a change signal that represents business truth rather than ETL noise. Get the signal right and the mechanism is straightforward. Get the signal wrong and no amount of XMLA scripting will produce correct semantics.

The next time you build incremental refresh against a fact table, the first question to ask isn't "which column do I poll on?" It's: can a row's content change after the row's creation date, and which timestamp on this table reflects business meaning rather than warehouse activity?

If you can answer that, the TMDL edit is the easy part.


Further reading


Notes from production Power BI engineering. Substitute your own schema, connector, and timestamp column names when adapting.