Modern data teams know that stale or outdated data is a hidden liability. In fact, Gartner estimates companies lose an average of $12.9 million per year due to poor data quality in outdated pipelines. We often think of “freshness” like the expiration date on milk – if your data is old, downstream users can get sick of making the wrong decisions. We will discuss how dbt source freshness helps detect stale data early and automatically, improving the trustworthiness of your analytics.

What is dbt Source Freshness?

“Freshness” in data terms means how up-to-date your data is. In dbt, source freshness is a built-in feature that checks whether your raw data tables have been updated recently. You configure a loaded_at_field – a timestamp column in the source table – and set expected thresholds (e.g., updated within the last hour). Then, dbt runs a freshness check against that field.

Consider a grocery store checking the expiration date on milk cartons before selling them. If the milk has expired, it will no longer be useful. dbt source freshness does something similar for your data: it examines the latest row timestamp and compares it to “now”. If the data is older than you expect (say older than 24 hours), the test will warn or fail. In practice, you just add a freshness block to your source YAML (the same files where you declare dbt sources) and specify loaded_at_field.

For example:

sources:
  - name: my_source
    database: raw
    schema: analytics
    tables:
      - name: customers
        freshness:
          warn_after:
            count: 1
            period: day
          error_after:
            count: 2
            period: day
        loaded_at_field: updated_at

This tells dbt to expect the customers table to be updated at least once a day; if it’s older than 1 day, a warning shows, and older than 2 days, an error occurs. dbt source freshness tests the recency of your raw data so you can spot problems right at the pipeline’s start

To better understand how dbt architecture supports these types of checks, explore the dbt architecture in more detail here.

Why is it Essential to Test Data Sources for Freshness?

Detect Pipeline Failures Early

When ingestion jobs fail or slow down, downstream analytics break silently. Testing at the source lets you catch those failures immediately. If your daily CSV load stops, the freshness test alerts you the moment data stops flowing. For example, one team discovered a two‑week ingestion outage only because a dbt freshness test alerted them that no new rows were arriving. Without that test, they would have continued reporting incomplete data!

Maintain Stakeholder Trust

Updated data builds confidence. Data stakeholders want to know they can rely on dashboards and reports. By surfacing simple metrics like “last updated at” automatically, you communicate data recency to every user. Generally, teams use freshness metadata in dashboards or documentation to reassure business users that “yes, today’s report is based on data as recent as last night.”

Enhance Data Quality Monitoring

Fresh data means your warehouse reflects the current state of the business. By measuring freshness, you cover issues that schema tests or uniqueness tests won’t catch. For example, a “non-null” test won’t tell you if the data hasn’t arrived at all. A freshness test, however, flags that scenario immediately.

Improve Documentation and Transparency

dbt automatically records freshness results as metadata, making your pipelines more self-documenting. When you run dbt source freshness, the results are saved (for example in target/source_freshness.json). You can even output to JSON with --output target/source_freshness.json. Teams can use this output in dashboards or reports to make data recency visible, without building custom infrastructure.

How does dbt Source Freshness Work?

The process is straightforward. In your source definition YAML, you add a freshness block and specify loaded_at_field. dbt then compiles a SQL query that finds the maximum timestamp in that field. One example can be:

SELECT 
  MAX({{ loaded_at_field }}) AS max_loaded_at, 
  CURRENT_TIMESTAMP AS now 
FROM {{ref_customers}}

This query returns the most recent data timestamp for each source table. dbt compares max_loaded_at to now, applies any filters if provided, and checks against your warn_after and error_after thresholds. If the data’s age exceeds your limits, the freshness test fails (or warns). If your warehouse supports it, dbt can also use table metadata instead of a custom column.

Since dbt 1.7, some adapters like Snowflake, Redshift, or BigQuery can automatically retrieve the last update time from the database metadata. But in most cases, you’ll simply use a timestamp column like loaded_at_field. But make sure this field is in UTC or cast appropriately – timezone mismatches are common mistakes.

To understand the differences between dbt Core and dbt Cloud, which may impact your freshness configuration, check out this comparison.

To execute the check, simply run the dbt command:

dbt source freshness

This command runs freshness checks on all sources with a freshness config. By default, it prints results to the console and writes them to target/sources.json. For example:

dbt source freshness --output target/source_freshness.json

This will save the raw output. You can then review which tables are fresh and which are stale. Remember that dbt supports both warn_after and error_after in each freshness block; a warning won’t stop your pipeline, but an error will cause the dbt source freshness command to fail. Many teams use a strict warn_after (quick alert) and a larger error_after (hard stop) to get notified early and still have time to intervene.

How to use dbt source freshness effectively?

You can follow these steps to leverage and integrate freshness checks fully into your workflow:

1. Configure your sources

Edit your sources.yml (or schema YAML) and define each source table with a freshness block. Specify the timestamp column (loaded_at_field) and your warn_after/error_after thresholds. For example:

       sources:
        - name: sales_db
          database: raw
          schema: sales
          freshness:
            warn_after: {count: 6, period: hour}
            error_after: {count: 12, period: hour}
          loaded_at_field: last_load_time
          tables:
            - name: transactions
            - name: customers
      

      This setup says the whole sales_db source should have updates at least every 6 hours (warning) and definitely by 12 hours (error). Keep your YAML organized – if all tables share the same loaded_at_field, define it at the source level to avoid repetition. Check dbt sources docs for details on source configuration.

      2. Run the freshness command. Execute:

            dbt source freshness

            This runs your freshness checks against the warehouse. The output will show each source/table and its state (pass, warn, or error). For example, the dbt command reference notes that the results are stored in a file (target/sources.json). If any errors occur, your pipeline job will fail. You can also redirect output to JSON to inspect recency or trigger alerts:

            dbt source freshness --output target/source_freshness.json

            3. Automate checks

            Schedule the freshness command to run on a regular cadence (e.g. hourly or daily) as part of your CI/CD or orchestration (Airflow, GitHub Actions, etc.). Automating this ensures you continuously monitor ingestion without manual effort. If your stack supports it, add a “freshness check” step right after your data load job. This way, ingestion failures or delays get caught before transformation jobs even start.

            4. Monitor and alert

            Integrate the results into your observability tools. You could parse the JSON output to raise alerts (Slack, email) or include freshness status on a dashboard. Some teams add a freshness summary to their dbt docs site or data catalog for data consumers to view. The key is: surface freshness metadata automatically, rather than writing custom SQL or building separate pipelines.

            5. Build downstream models conditionally

            For maximum efficiency, combine freshness with dbt’s selection arguments. The dbt docs recommend running fresh checks first and then building only models whose sources are fresh. For example:

             dbt source freshness
            
            dbt build --select source_status:fresher+
            

            The first command marks each source, then the second (–select source_status:fresher+) rebuilds only those models downstream of sources that passed freshness. This avoids wasted computation on stale data. In summary, running freshness first and then filtering builds ensures your models always use the latest data and your pipeline is efficient.

            6. Iterate and refine

            Choosing the right thresholds is important. Analyze historical loads (e.g., query max gap between loads) to pick realistic warn_after values. Consider adding filters in complex sources: dbt freshness allows a SQL filter to exclude irrelevant rows (e.g,. soft-deleted records). Keep your configs DRY by applying defaults at the source level and overriding only when necessary. Over time, adjust as your data latency patterns evolve.

            Conclusion

            Testing source freshness in dbt is an easy way to improve trust in your data pipelines. By defining loaded_at_field and freshness thresholds, you get automated alerts whenever raw data isn’t updated on time. This detects failures early, builds stakeholder confidence, and keeps your data quality checks complete. Best of all, dbt handles this natively without extra coding: just configure and run dbt source freshness.
            Get started today and see how freshness tests fit into your workflow. Explore dbt’s capabilities and use cases, and try running your first source freshness check. With tools like Hevo supporting dbt, adding automated freshness checks is straightforward – soon you won’t have to guess when data was last updated.

            Looking to streamline your transformation workflows? Try Hevo Transformer, built on dbt core, to directly transform your data within your warehouse.

            FAQs

            1. What are the common challenges and solutions when using dbt source freshness?

            Some challenges are missing timestamp columns, timezone mismatches, and filtering outdated records.

            2. What are the best practices for using dbt source freshness?

            Test freshness as soon as data lands, before running downstream models. Use both warn and error. Define a warning threshold: for example, warn at 6 hours, error at 12 hours. Document and review thresholds. Combine with tests and alerts.

            3. What additional features does dbt source freshness offer?

            You can add a filter under the freshness block to limit which rows are checked. For example, exclude inactive records so the freshness query only scans current data. Warehouse metadata. On supported adapters, dbt can skip the select query and fetch freshness from system tables (e.g. Snowflake’s metadata).

            4. How to run freshness tests in dbt?

            First, define freshness tests in your YAML as shown above (adding freshness and loaded_at_field to your source tables). Then simply run the built-in command: dbt source freshness. This will evaluate all configured freshness tests. Check the console or the generated target file for results.

            5. What testing strategies should I follow in dbt?

            Use freshness tests, plus basic data tests (row count, freshness, not_null on key fields) right after ingestion. Continue writing schema and logical tests (not_null, unique, custom tests). Automate tests in CI/CD. Run dbt test and dbt source freshness on every deploy or on a schedule, and alert if anything fails.

            Khawaja Abdul Ahad
            Data Analytics Expert

            Khawaja Abdul Ahad is a seasoned Data Scientist and Analytics Engineer with over 4 years of experience. Specializing in data analysis, predictive modeling, NLP, and cloud solutions, he transforms raw data into actionable insights. Passionate about leveraging ML-based solutions, Khawaja excels in creating data-driven strategies that drive business growth and innovation.