Skip to main content

freshness

dbt_project.yml
sources:
<resource-path>:
+freshness:
warn_after:
count: <positive_integer>
period: minute | hour | day

Definition

A freshness block is used to define the acceptable amount of time between the most recent record, and now, for a table to be considered "fresh".

In the freshness block, one or both of warn_after and error_after can be provided. If neither is provided, then dbt will not calculate freshness snapshots for the tables in this source.

In most cases, the loaded_at_field is required. Some adapters support calculating source freshness from the warehouse metadata tables and can exclude the loaded_at_field.

If a source has a freshness: block, dbt will attempt to calculate freshness for that source:

  • If loaded_at_field is provided, dbt will calculate freshness via a select query.
  • If loaded_at_field is not provided, dbt will calculate freshness via warehouse metadata tables when possible.

Currently, calculating freshness from warehouse metadata tables is supported on the following adapters:

Freshness blocks are applied hierarchically:

  • A freshness and loaded_at_field property added to a source will be applied to all tables defined in that source.
  • A freshness and loaded_at_field property added to a source table will override any properties applied to the source.

This is useful when all of the tables in a source have the same loaded_at_field, as is often the case.

To exclude a source from freshness calculations, explicitly set freshness: null.

In state-aware orchestration, dbt uses the warehouse metadata by default to check if sources (or upstream models in the case of Mesh) are fresh. For more information on how freshness is used by state-aware orchestration, see Advanced configurations.

loaded_at_field

Optional on adapters that support pulling freshness from warehouse metadata tables, required otherwise.

A column name (or expression) that returns a timestamp indicating freshness.

If using a date field, you may have to cast it to a timestamp:

loaded_at_field: "completed_date::timestamp"

Or, depending on your SQL variant:

loaded_at_field: "CAST(completed_date AS TIMESTAMP)"

If using a non-UTC timestamp, cast it to UTC first:

loaded_at_field: "convert_timezone('Australia/Sydney', 'UTC', created_at_local)"

count

(Required)

A positive integer for the number of periods where a data source is still considered "fresh".

period

(Required)

The time period used in the freshness calculation. One of minute, hour or day

filter

(optional)

Add a where clause to the query run by dbt source freshness in order to limit data scanned.

This filter only applies to dbt's source freshness queries - it will not impact other uses of the source table.

This is particularly useful if:

  • You are using BigQuery and your source tables are partitioned tables
  • You are using Snowflake, Databricks, or Spark with large tables, and this results in a performance benefit

Examples

Complete example

models/<filename>.yml

version: 2

sources:
- name: jaffle_shop
database: raw
config:
# changed to config in v1.9
freshness: # default freshness
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}

loaded_at_field: _etl_loaded_at

tables:
- name: customers # this will use the freshness defined above

- name: orders
config:
freshness: # make this a little more strict
warn_after: {count: 6, period: hour}
error_after: {count: 12, period: hour}
# Apply a where clause in the freshness query
filter: datediff('day', _etl_loaded_at, current_timestamp) < 2


- name: product_skus
config:
freshness: # do not check freshness for this table

When running dbt source freshness, the following query will be run:

select
max(_etl_loaded_at) as max_loaded_at,
convert_timezone('UTC', current_timestamp()) as snapshotted_at
from raw.jaffle_shop.orders

where datediff('day', _etl_loaded_at, current_timestamp) < 2

Was this page helpful?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

0