If you’ve ever felt like your dbt models are a bit too ‘one-size-fits-all’? You’re not alone. A bunch of analytics engineers start with the basics and later realize that tailoring configurations can significantly enhance performance and maintainability.
In this blog, we will learn how dbt configs can empower you to fine-tune your data transformations, ensuring they align perfectly with your project’s unique needs.
Table of Contents
What Are dbt Configs?
dbt configs are a way to specify and manage how your data is stored and its loading fashion. IT helps to tweak model, snapshot, seed, and test strategies for your use cases. In simpler terms, these help you specify how your tables are materialized, named, partitioned, and many more.
In a nutshell, it defines how dbt should process, store, and organize your data.
Types of dbt Configs
dbt supports various configurations on various resources. Let’s discuss the types of resources that can be configured:
- Model configs: This config helps to control how models are materialized. You can define table materialization as a view, a table, or incremental. This also allows you to specify where data would reside and how it should be loaded..
- Seed configs: This configuration helps to configure how CSV data is loaded into your data warehouse.
- Snapshot configs: This configuration can help you to define and capture slowly changing dimensions.
- Test configs: Test configs help you to define custom test behaviors or severity levels.
- Analysis configs: This helps you to control how SQL files are analysed.
Ways to Apply Configs in dbt
dbt configurations can be applied in three ways
- Using
dbt_project.yml
- Using the
config()
block in the model file - Using cli flag
Let’s discuss each one of them in detail
Using dbt_project.yml
The dbt_project.yml
file allows you to define default configurations for your entire project or specific models. This can be used to implement the same strategy project-wide.
models:
my_project:
staging:
materialized: view
marts:
materialized: table
Using the config() block in a model file
You can set configs at the top of your .sql model file using the {{ config(...) }}
Jinja macro.
{{ config(
materialized='incremental',
unique_key='id'
) }}
SELECT * FROM raw.customers
Using CLI flags
Some configurations can also be passed via CLI or during job execution in CI/CD environments. This type is used for temporary overrides or parameterization
dbt run --models my_model --vars '{"materialized": "table"}'
Explore best practices for implementing dbt CI/CD pipelines to automate your data transformation workflows.
Hierarchy and Precedence of Configs
dbt applies configurations in the following order of precedence (from lowest to highest):
- dbt_project.yml
- Model file {{ config(…) }}
- CLI arguments (–vars)
Thus, inline config overrides project-wide settings, and CLI flags override both of them.
Most Common dbt Model Configs
The table below summarizes the most used model configs with their purpose
Config | Purpose | Example |
materialized | View/table/incremental/ephemeral | materialized=’incremental’ |
schema | Override the default schema | schema=’analytics’ |
alias | Override the table name | alias=’final_customer_data’ |
tags | Add metadata tags | tags [‘daily’, ‘finance’] |
unique_key | Key for incremental merge | unique_key=’id’ |
on_schema_change | How to handle new columns | on_schema_change=’append_new_columns’ |
persist_docs | Sync the dbt docs with the warehouse | persist_docs {‘columns’: true} |
partition_by | Partition table (BQ/Snowflake) | partition_by {‘field’: ‘created_at’} |
cluster_by | Cluster for performance | cluster_by=[‘customer_id’] |
Let’s explore each of them with practical examples.
Materialized
This config defines how the model will be persisted in the data warehouse. It can be set to view(default), table, incremental, or ephemeral.
Example:
{{ config(materialized='table') }}
Use Cases:
- Views should be used where lightweight transformations are needed and data needn’t be stored.
- A table should be used where performance matters and data will not change frequently.
- Incremental strategy is used for large datasets where updates should be made.
- Ephemeral strategy is used where models should not be stored and are required for temporary intermediate purposes.
To explore different materialization strategies and their use cases, check out our guide on dbt materializations.
Schema
It is used to override the schema in which the model is placed. This is used when models need to reroute data to different schemas. For eg : gold/silver layer or staging/marts.
{{ config(schema='analytics') }}
Alias
It is used to override the model’s table name in the database. This helps to name the table differently than its default, which is picked from the .sql filename.
{{ config(alias='final_customer_data') }}
Tags
As the name suggests, it allows adding arbitrary tags to models. This is useful for documentation, testing, and job orchestration.
{{ config(tags=['finance', 'daily_run']) }}
You can run models by tag:
dbt run --select tag:daily_run
Persist Docs
This is used to persist model descriptions as table and column comments in the database. This helps downstream applications (like Looker, Tableau) to read the dbt-generated documentation.
{{ config(persist_docs={'relation': true, 'columns': true}) }}
Unique Key
This is needed for incremental models to define the key used to merge new records. It helps dbt to identify which rows to update vs insert.
{{ config(
materialized='incremental',
unique_key='customer_id'
) }}
On Schema Change
This configuration helps to control behavior when the schema of the source changes in incremental models. This can be set to ignore, fail, append_new_columns
, or sync_all_columns
according to the use case. It helps to avoid any errors/failures that result from a new column getting added to the upstream.
{{ config(on_schema_change='append_new_columns') }}
Partition by
This is used for partitioning and clustering tables for performance. This helps to improve the performance of the data warehouse on large datasets at scale.
{{ config(
materialized='table',
partition_by={'field': 'created_at', 'data_type': 'timestamp'},
cluster_by=['customer_id']
) }}
Environment-Specific Configs
You can use Jinja + target.name
to define environment-specific behavior. This helps you to configure lighter materializations for dev environments.
{{ config(
materialized='table' if target.name == 'prod' else 'view'
) }}
Custom Materializations
Custom materializations (like snapshot+incremental hybrids) can also be created and defined for unique data needs.
Example: Real-World Config in a Model
Let’s walk through a real-world example of a model config:
{{ config(
materialized='incremental',
unique_key='user_id',
on_schema_change='append_new_columns',
tags ['user_engagement', 'daily'],
schema='marts',
alias='daily_active_users',
persist_docs = {'relation': true, 'columns': true}
) }}
SELECT
user_id,
session_id,
event_type,
event_timestamp
FROM {{ ref('events') }}
WHERE event_type = 'login'
In this example:
- The model is incrementally updated.
- It is stored in the mart’s schema with an alias of daily_active_users.
- If new columns appear upstream, they’ll be automatically added.
- Tags help identify this model for daily orchestration.
- Documentation is synced with your warehouse.
Best Practices for Using dbt Configs
- Keep configs consistent via dbt_project.yml
Set baseline configurations per folder or model group.
models:
my_project:
staging:
+materialized: view
marts:
+materialized: incremental
- Use tags for orchestration.
Tags can be used to differentiate and run models logically.
- Document with persist_docs
Documentation helps other team members to easily navigate and understand. Thus, automated generation of documentation can be used as a single source of truth for reference.
- Use an alias sparingly.
Changing table names for inline can reduce maintainability and create confusion. It should be used only where it is necessary.
- Don’t overuse ephemeral models.
Ephemeral models help with reusability, but they can cause performance issues on large datasets. Hence, use ephemeral models wisely.
Debugging and Testing Configs
dbt provides a way to view your applied configs. These configs can be checked using the following command
dbt debug
dbt ls --select my_model --output json
Also, dbt Cloud and dbt Docs (dbt docs generate && dbt docs serve) show applied configurations per model.
If you encounter further issues with your dbt configurations, use the dbt debug command to check for errors. Our guide on using dbt debug can help you troubleshoot effectively.
Conclusion
In this blog, we discussed various types of dbt configs and best practices to use them. Indeed, it is better to say that dbt configs are a core building block of every dbt project. They help you to be the boss of your data. You can define and manage how your data is built, stored, and maintained.
If used properly, configs can be leveraged to achieve better performance, maintainability, and documentation across your data project. Hence, dbt configs give you full control of your data and the pipeline to handle product-grade transformations.
Ready to supercharge your dbt workflows? Try Hevo Transformer today to streamline your data transformations and integrate seamlessly with your dbt setup!
Frequently Asked Questions (FAQs)
1. Where should I define my dbt configs, dbt_project.yml, or inside model files?
Use dbt_project.yml
for setting defaults across the entire project, and use the {{ config(...) }}
block inside model files for model-specific overrides. Model-level configs take precedence over project-level configs.
2. What does the materialized config do in dbt?
It helps you to define how dbt builds the model and in what fashion data should be loaded into your table. It provides various materialization options. For example: View, Table, Incremental, and Ephemeral.
3. How can I apply different configs for development and production environments?
You can use Jinja logic with the target. name to apply environment-specific configs:{{ config(materialized='table' if target.name == 'prod' else 'view') }}
. This can help you ensure efficient resource utilization in the development and production environment.