If you’ve ever felt like your dbt models are a bit too ‘one-size-fits-all’? You’re not alone. A bunch of analytics engineers start with the basics and later realize that tailoring configurations can significantly enhance performance and maintainability.

In this blog, we will learn how dbt configs can empower you to fine-tune your data transformations, ensuring they align perfectly with your project’s unique needs.

What Are dbt Configs?

dbt configs are a way to specify and manage how your data is stored and its loading fashion. IT helps to tweak model, snapshot, seed, and test strategies for your use cases. In simpler terms, these help you specify how your tables are materialized, named, partitioned, and many more.

In a nutshell, it defines how dbt should process, store, and organize your data.

Types of dbt Configs

dbt supports various configurations on various resources. Let’s discuss the types of resources that can be configured:

  • Model configs: This config helps to control how models are materialized. You can define table materialization as a view, a table, or incremental. This also allows you to specify where data would reside and how it should be loaded..
  • Seed configs: This configuration helps to configure how CSV data is loaded into your data warehouse.
  • Snapshot configs: This configuration can help you to define and capture slowly changing dimensions.
  • Test configs: Test configs help you to define custom test behaviors or severity levels.
  • Analysis configs: This helps you to control how SQL files are analysed.

Ways to Apply Configs in dbt

dbt configurations can be applied in three ways

  • Using dbt_project.yml
  • Using the config() block in the model file
  • Using cli flag

Let’s discuss each one of them in detail

Using dbt_project.yml

The dbt_project.yml file allows you to define default configurations for your entire project or specific models. This can be used to implement the same strategy project-wide.

models:
  my_project:
    staging:
      materialized: view
    marts:
      materialized: table

Using the config() block in a model file

You can set configs at the top of your .sql model file using the {{ config(...) }} Jinja macro.

{{ config(
    materialized='incremental',
    unique_key='id'
) }}

SELECT * FROM raw.customers

Using CLI flags

Some configurations can also be passed via CLI or during job execution in CI/CD environments. This type is used for temporary overrides or parameterization

dbt run --models my_model --vars '{"materialized": "table"}'

Explore best practices for implementing dbt CI/CD pipelines to automate your data transformation workflows.

Hierarchy and Precedence of Configs

dbt applies configurations in the following order of precedence (from lowest to highest):

  1. dbt_project.yml
  2. Model file {{ config(…) }}
  3. CLI arguments (–vars)

Thus, inline config overrides project-wide settings, and CLI flags override both of them.

Most Common dbt Model Configs

The table below summarizes the most used model configs with their purpose

ConfigPurposeExample
materializedView/table/incremental/ephemeralmaterialized=’incremental’
schemaOverride the default schemaschema=’analytics’
aliasOverride the table namealias=’final_customer_data’
tagsAdd metadata tagstags [‘daily’, ‘finance’]
unique_keyKey for incremental mergeunique_key=’id’
on_schema_changeHow to handle new columnson_schema_change=’append_new_columns’
persist_docsSync the dbt docs with the warehousepersist_docs {‘columns’: true}
partition_byPartition table (BQ/Snowflake)partition_by {‘field’: ‘created_at’}
cluster_byCluster for performancecluster_by=[‘customer_id’]

Let’s explore each of them with practical examples.

Materialized

This config defines how the model will be persisted in the data warehouse. It can be set to view(default), table, incremental, or ephemeral.

Example:

{{ config(materialized='table') }}

Use Cases:

  • Views should be used where lightweight transformations are needed and data needn’t be stored.
  • A table should be used where performance matters and data will not change frequently.
  • Incremental strategy is used for large datasets where updates should be made.
  • Ephemeral strategy is used where models should not be stored and are required for temporary intermediate purposes.

To explore different materialization strategies and their use cases, check out our guide on dbt materializations.

Schema

It is used to override the schema in which the model is placed. This is used when models need to reroute data to different schemas. For eg : gold/silver layer or staging/marts.

{{ config(schema='analytics') }}

Alias

It is used to override the model’s table name in the database. This helps to name the table differently than its default, which is picked from the .sql filename.

{{ config(alias='final_customer_data') }}

Tags

As the name suggests, it allows adding arbitrary tags to models. This is useful for documentation, testing, and job orchestration.

{{ config(tags=['finance', 'daily_run']) }}

You can run models by tag:

dbt run --select tag:daily_run

Persist Docs

This is used to persist model descriptions as table and column comments in the database. This helps downstream applications (like Looker, Tableau) to read the dbt-generated documentation.

{{ config(persist_docs={'relation': true, 'columns': true}) }}

Unique Key

This is needed for incremental models to define the key used to merge new records. It helps dbt to identify which rows to update vs insert.

{{ config(
    materialized='incremental',
    unique_key='customer_id'
) }}

On Schema Change

This configuration helps to control behavior when the schema of the source changes in incremental models. This can be set to ignore, fail, append_new_columns, or sync_all_columns according to the use case. It helps to avoid any errors/failures that result from a new column getting added to the upstream.

{{ config(on_schema_change='append_new_columns') }}

Partition by

This is used for partitioning and clustering tables for performance. This helps to improve the performance of the data warehouse on large datasets at scale.

{{ config(
    materialized='table',
    partition_by={'field': 'created_at', 'data_type': 'timestamp'},
    cluster_by=['customer_id']
) }}

Environment-Specific Configs

You can use Jinja + target.name to define environment-specific behavior. This helps you to configure lighter materializations for dev environments.

{{ config(
    materialized='table' if target.name == 'prod' else 'view'
) }}

Custom Materializations

Custom materializations (like snapshot+incremental hybrids) can also be created and defined for unique data needs.

Example: Real-World Config in a Model

Let’s walk through a real-world example of a model config:

{{ config(
    materialized='incremental',
    unique_key='user_id',
    on_schema_change='append_new_columns',
    tags ['user_engagement', 'daily'],
    schema='marts',
    alias='daily_active_users',
    persist_docs = {'relation': true, 'columns': true}
) }}

SELECT
    user_id,
    session_id,
    event_type,
    event_timestamp
FROM {{ ref('events') }}
WHERE event_type = 'login'

In this example:

  • The model is incrementally updated.
  • It is stored in the mart’s schema with an alias of daily_active_users.
  • If new columns appear upstream, they’ll be automatically added.
  • Tags help identify this model for daily orchestration.
  • Documentation is synced with your warehouse.

Best Practices for Using dbt Configs

  • Keep configs consistent via dbt_project.yml 

 Set baseline configurations per folder or model group.

models:
  my_project:
    staging:
      +materialized: view
    marts:
      +materialized: incremental
  • Use tags for orchestration.

Tags can be used to differentiate and run models logically.

  • Document with persist_docs

Documentation helps other team members to easily navigate and understand. Thus, automated generation of documentation can be used as a single source of truth for reference.

  • Use an alias sparingly.

Changing table names for inline can reduce maintainability and create confusion. It should be used only where it is necessary.

  • Don’t overuse ephemeral models.

Ephemeral models help with reusability, but they can cause performance issues on large datasets. Hence, use ephemeral models wisely.

Debugging and Testing Configs

dbt provides a way to view your applied configs. These configs can be checked using the following command

dbt debug
dbt ls --select my_model --output json

Also, dbt Cloud and dbt Docs (dbt docs generate && dbt docs serve) show applied configurations per model.

If you encounter further issues with your dbt configurations, use the dbt debug command to check for errors. Our guide on using dbt debug can help you troubleshoot effectively.

Conclusion

In this blog, we discussed various types of dbt configs and best practices to use them. Indeed, it is better to say that dbt configs are a core building block of every dbt project. They help you to be the boss of your data. You can define and manage how your data is built, stored, and maintained. 

If used properly, configs can be leveraged to achieve better performance, maintainability, and documentation across your data project. Hence, dbt configs give you full control of your data and the pipeline to handle product-grade transformations.

Ready to supercharge your dbt workflows? Try Hevo Transformer today to streamline your data transformations and integrate seamlessly with your dbt setup!

Frequently Asked Questions (FAQs)

1. Where should I define my dbt configs, dbt_project.yml, or inside model files?

Use dbt_project.yml for setting defaults across the entire project, and use the {{ config(...) }} block inside model files for model-specific overrides. Model-level configs take precedence over project-level configs.

2. What does the materialized config do in dbt?

It helps you to define how dbt builds the model and in what fashion data should be loaded into your table. It provides various materialization options. For example: View, Table, Incremental, and Ephemeral.

3. How can I apply different configs for development and production environments?

You can use Jinja logic with the target. name to apply environment-specific configs:
{{ config(materialized='table' if target.name == 'prod' else 'view') }}. This can help you ensure efficient resource utilization in the development and production environment.

Neha Sharma
Software Engineer

Neha is an experienced Data Engineer and AWS certified developer with a passion for solving complex problems. She has extensive experience working with a variety of technologies for analytics platforms, data processing, storage, ETL and REST APIs. In her free time, she loves to share her knowledge and insights through writing on topics related to data and software engineering.