If you’re using dbt (Data Build Tool), you’re already on the right track when building scalable data models. As teams grow and projects get more complex, it’s easy for a dbt setup to get messy, and things start breaking.
The good news is that these are relatively common problems with known solutions. You can maintain a clean and trustworthy database by adopting some best practices.
If you’re new to dbt or want a solid introduction to how it works, get a clear overview of what dbt is and why it’s useful. In this article, we’ll walk through the 12 dbt best practices to write better code and have more confidence in your data.
Table of Contents
1. Establish a Clear Project Structure
A clear project structure means organizing your dbt files and folders in a logical way. That usually means separating models, tests, documentation, and config files into their own subfolders.
In short, a clean folder system that lets anyone on your team open the project and immediately understand what’s going on.
Without structure, it’s really hard to find your models, how data flows, or what is wrong. A well-structured project makes it easier to manage, scale, and collaborate on a dbt project.
To better understand how project structure fits into the overall setup, check out this dbt architecture page, which explains how models, sources, and tests work together in dbt.
Example
A typical dbt project is organized into three main folders inside the models/ directory:
models/
├── staging/
├── intermediate/
└── marts/
Each layer has a clear role:
- Staging: This is where raw data is cleaned and standardized.
- Intermediate: You build logic here that prepares the data for final use.
- Marts: These are the final outputs: business-ready tables that are used in dashboards and reports.
2. Use Consistent Naming Conventions
It is a simple and effective way to ensure everyone on your team names things – models, files, columns – in the same consistent and meaningful way.
When names are predictable, it’s much easier to understand what each model does just by looking at it. You spend less time guessing and more time working.
Example
Use prefixes to show where models fit in the pipeline:
stg_
for staging modelsint_
for intermediate models.fct_
anddim_
for fact and dimension models.
3. Use Sources and Ref Functions Properly
In dbt, you don’t have to hardcode table names directly into your SQL. You could use the ref( ) function to reference other data models and source( ) to pull raw tables from your data warehouse.
Using ref(x) and source(x) not only helps with cleaner code but also builds your project’s dependency graph, which is brilliant stuff. It also automatically handles changing table names and visualizes data lineage in dbt docs.
To learn more about using sources effectively, check out this guide on dbt sources.
Example
Instead of writing hardcoded SQL like this:
SELECT * FROM raw.sales_data
SELECT * FROM analytics.daily_orders
You can use dbt’s source()
and ref()
functions:
SELECT * FROM {{ source('raw', 'sales_data') }}
SELECT * FROM {{ ref('daily_orders') }}
4. Document Models and Sources
In dbt, you can describe what your models and fields do using simple YAML files. This builds documentation right into your project without requiring any extra tools.
Good documentation makes it easier for everyone on your team to understand your data’s meaning and how it’s used. It also saves time by answering common questions right in the project; there’s no need to ask around.
Example
models:
- name: fct_orders
description: "Fact table containing all customer orders."
columns:
- name: order_id
description: "Primary key for the order. Uniquely identifies each order."
- name: total_amount
description: "Total value of the order."
5. Write and Enforce Data Tests
You can do tests in dbt to check assumptions about the data — things like “this column should never be null”. Create your own SQL tests or use the predefined ones for basic tests.
Your transformations are only as good as the data you’re transforming. Tests help catch problems early before they cause dashboard crashes or bad business decisions. Validation can be done using basic dbt commands, some of which are listed in this guide.
Example
models:
- name: fct_orders
columns:
- name: order_id
tests:
- not_null
- unique
6. Use Jinja and Macros for Reusability
dbt allows you to write SQL in Jinja (a templating language that makes your code more dynamic ). You can also use dbt macros ( code snippets like functions).
You need a macro if you find yourself copying/pasting the same filters in multiple models. By using Jinja and macros, you help keep your project DRY (Don’t Repeat Yourself), making it easier to maintain.
Example
{% macro active_records() %}
is_active = true AND deleted_at IS NULL
{% endmacro %}
Then use it like this:
SELECT * FROM users WHERE {{ active_records() }}
7. Use Tags and Metadata for Organization
Tags are very simple labels you can add to models in your schema. yml. You can use tags to group and categorize models by team, use case, refresh frequency, etc, whatever your project might need.
As your project grows, tags make it easy to run focused subsets of models, build targeted jobs or even filter by topic. Learn more about how to effectively use dbt tags and their benefits in our dedicated guide.
Example
models:
- name: fct_orders
tags: ['finance', 'daily_run']
8. Review and Monitor Performance
Think about how fast your dbt models are and how much it costs to process them. We log these for monitoring purposes.
Even if your models work, they may not be fast or cheap. Fast models cause delays in dashboard development and waste time. Performance monitoring can help your project run smoothly as your data increases.
Example:
- Use tools like Snowflake, BigQuery, or Redshift to spot slow-running queries.
- Break large, complex models into smaller, easier-to-manage ones.
- Use incremental models to avoid reprocessing all data each time.
- Filter your data early in the query to reduce unnecessary processing.
For more info on performance, check out the comparison between dbt Core and dbt Cloud.
9. Adopt a Clear Development Workflow
Create a simple, consistent way for your team to work with dbt, whether it’s making changes to the code, testing them, or pushing them to production.
It’s easier for everyone to get along, fewer bugs, less confusion, easier collaboration, and it’s more readily accessible for people new to a team.
Example
- Use Git for version control.
- Create a new branch for each change.
- Open a pull request and get the code reviewed before merging.
10. Automate Deployments with CI/CD
CI/CD (Continuous Integration/Continuous Deployment) is a way to automatically test and deploy your dbt models whenever you release new code. You can use tools like GitHub Actions, GitLab CI, or dbt Cloud for this automation.
Automation of tests and deployment saves lots of time and errors. It means you can ensure your code is always tested before it’s deployed and that your data pipeline is trustworthy. You can learn more about setting up dbt CI/CD.
Example
- Run the dbt build automatically whenever you make a pull request.
- It will automatically deploy code to production when it’s merged into the main branch.
- Schedule jobs to refresh models regularly (e.g., daily or hourly).
11. Choose the Right Materialization Strategy
In dbt, materializations are how your models are built into the database, and which one is right depends on the size, complexity, and amount of use of your model.
The materialization you pick can make or break the performance of the system. For example, using a table for large, complex logic can speed up queries, while a view is better for lightweight transformations.
Example:
- View: Good for small models or when you’re testing things.
- Table: Best for huge models you query often, faster to use.
- Incremental: Perfect for huge databases.
Materializations help solve common data problems like slow performance or large data volumes. Learn how they’re used in real projects in these dbt use cases.
12. Define Metrics and Dimensions with the dbt Semantic Layer
It’s easy to add domain-specific metadata using a semantic layer in dbt. As you add more or less keywords to your dbt objects, that metadata will be added automatically to the semantic layer of your dbt object.
Without a shared definition, everyone might calculate the same metric (like revenue or active users) differently, which leads to confusion.
The semantic layer solves this by giving your team one clear definition for each metric, so everyone’s on the same page, no more arguments over numbers.
Example
You can define a metric like total_revenue
once in dbt, and then use it across all your tools. No need to redefine it each time.
Set dimensions like customer_segment
to continuously slice your metrics in any given place.
Conclusion
By following these dbt best practices, you can make sure your dbt projects are organized, efficient, and scalable. Each of these practices will help you to improve the quality and maintainability of your data models.
These principles will help you easily manage new projects, align your teams, and develop solid data pipelines that ultimately deliver correct insights.
Ready to streamline your data transformation process? Try Hevo Transformer today to enhance your dbt workflows and accelerate data processing!
Frequently Asked Questions about dbt Best Practices
1. What is dbt testing?
dbt testing involves running checks like ensuring primary keys are unique or values are not null, ensuring a reliable and accurate data pipeline.
2. What testing strategies should I follow in dbt?
Common testing strategies in dbt include:
Schema tests: Test constraints like uniqueness or non-null values.
Data tests: Validate business logic and custom rules.
Custom tests: Create specific tests tailored to your data’s needs.
3. Why should you codify your best practices in dbt?
Codifying best practices ensures consistency across projects, reduces errors, and makes scaling and maintaining the project easier.
4. How do I implement dbt infrastructure on a big team?
For large teams, organizing your project with layers (e.g., staging, intermediate, and marts) is key to avoiding conflicts. Also, use Git for version control and stick to naming conventions.