If you’re using dbt (Data Build Tool), you’re already on the right track when building scalable data models. As teams grow and projects get more complex, it’s easy for a dbt setup to get messy, and things start breaking. 

The good news is that these are relatively common problems with known solutions. You can maintain a clean and trustworthy database by adopting some best practices.

If you’re new to dbt or want a solid introduction to how it works, get a clear overview of what dbt is and why it’s useful. In this article, we’ll walk through the 12 dbt best practices to write better code and have more confidence in your data.

1. Establish a Clear Project Structure

A clear project structure means organizing your dbt files and folders in a logical way. That usually means separating models, tests, documentation, and config files into their own subfolders. 

In short, a clean folder system that lets anyone on your team open the project and immediately understand what’s going on.

Without structure, it’s really hard to find your models, how data flows, or what is wrong. A well-structured project makes it easier to manage, scale, and collaborate on a dbt project.

To better understand how project structure fits into the overall setup, check out this dbt architecture page, which explains how models, sources, and tests work together in dbt.

Example

A typical dbt project is organized into three main folders inside the models/ directory:

models/
├── staging/
├── intermediate/
└── marts/

Each layer has a clear role:

  • Staging: This is where raw data is cleaned and standardized.
  • Intermediate:  You build logic here that prepares the data for final use.
  • Marts: These are the final outputs: business-ready tables that are used in dashboards and reports.

2. Use Consistent Naming Conventions

It is a simple and effective way to ensure everyone on your team names things – models, files, columns – in the same consistent and meaningful way.

When names are predictable, it’s much easier to understand what each model does just by looking at it. You spend less time guessing and more time working.

Example

Use prefixes to show where models fit in the pipeline:

  • stg_ for staging models 
  • int_ for intermediate models.
  • fct_ and dim_ for fact and dimension models.

3. Use Sources and Ref Functions Properly

In dbt, you don’t have to hardcode table names directly into your SQL. You could use the ref( ) function to reference other data models and source( ) to pull raw tables from your data warehouse.

Using ref(x) and source(x) not only helps with cleaner code but also builds your project’s dependency graph, which is brilliant stuff. It also automatically handles changing table names and visualizes data lineage in dbt docs.

To learn more about using sources effectively, check out this guide on dbt sources.

Example

Instead of writing hardcoded SQL like this:

SELECT * FROM raw.sales_data

SELECT * FROM analytics.daily_orders

You can use dbt’s source() and ref() functions:

SELECT * FROM {{ source('raw', 'sales_data') }}

SELECT * FROM {{ ref('daily_orders') }}

4. Document Models and Sources

In dbt, you can describe what your models and fields do using simple YAML files. This builds documentation right into your project without requiring any extra tools.

Good documentation makes it easier for everyone on your team to understand your data’s meaning and how it’s used. It also saves time by answering common questions right in the project; there’s no need to ask around.

Example

models:
  - name: fct_orders
    description: "Fact table containing all customer orders."
    columns:
      - name: order_id
        description: "Primary key for the order. Uniquely identifies each order."
      - name: total_amount
        description: "Total value of the order."

5. Write and Enforce Data Tests

You can do tests in dbt to check assumptions about the data — things like “this column should never be null”. Create your own SQL tests or use the predefined ones for basic tests.

Your transformations are only as good as the data you’re transforming. Tests help catch problems early before they cause dashboard crashes or bad business decisions. Validation can be done using basic dbt commands, some of which are listed in this guide.

Example

models:
  - name: fct_orders
    columns:
      - name: order_id
        tests:
          - not_null
          - unique

6. Use Jinja and Macros for Reusability

dbt allows you to write SQL in Jinja  (a templating language that makes your code more dynamic ). You can also use dbt macros ( code snippets like functions).

You need a macro if you find yourself copying/pasting the same filters in multiple models. By using Jinja and macros, you help keep your project DRY (Don’t Repeat Yourself), making it easier to maintain.

Example

{% macro active_records() %}
    is_active = true AND deleted_at IS NULL
{% endmacro %}

Then use it like this:

SELECT * FROM users WHERE {{ active_records() }}

7. Use Tags and Metadata for Organization

Tags are very simple labels you can add to models in your schema. yml. You can use tags to group and categorize models by team, use case, refresh frequency, etc, whatever your project might need.

As your project grows, tags make it easy to run focused subsets of models, build targeted jobs or even filter by topic. Learn more about how to effectively use dbt tags and their benefits in our dedicated guide.

Example

models:
  - name: fct_orders
    tags: ['finance', 'daily_run']

8. Review and Monitor Performance

Think about how fast your dbt models are and how much it costs to process them. We log these for monitoring purposes.

Even if your models work, they may not be fast or cheap. Fast models cause delays in dashboard development and waste time. Performance monitoring can help your project run smoothly as your data increases.

Example:

  • Use tools like Snowflake, BigQuery, or Redshift to spot slow-running queries.
  • Break large, complex models into smaller, easier-to-manage ones.
  • Use incremental models to avoid reprocessing all data each time.
  • Filter your data early in the query to reduce unnecessary processing.

For more info on performance, check out the comparison between dbt Core and dbt Cloud.

9. Adopt a Clear Development Workflow

Create a simple, consistent way for your team to work with dbt, whether it’s making changes to the code, testing them, or pushing them to production.

It’s easier for everyone to get along, fewer bugs, less confusion, easier collaboration, and it’s more readily accessible for people new to a team.

Example

  • Use Git for version control.
  • Create a new branch for each change.
  • Open a pull request and get the code reviewed before merging.

10. Automate Deployments with CI/CD

CI/CD (Continuous Integration/Continuous Deployment) is a way to automatically test and deploy your dbt models whenever you release new code. You can use tools like GitHub Actions, GitLab CI, or dbt Cloud for this automation.

Automation of tests and deployment saves lots of time and errors. It means you can ensure your code is always tested before it’s deployed and that your data pipeline is trustworthy. You can learn more about setting up dbt CI/CD.

Example

  • Run the dbt build automatically whenever you make a pull request.
  • It will automatically deploy code to production when it’s merged into the main branch.
  • Schedule jobs to refresh models regularly (e.g., daily or hourly).

11. Choose the Right Materialization Strategy

In dbt, materializations are how your models are built into the database, and which one is right depends on the size, complexity, and amount of use of your model.

The materialization you pick can make or break the performance of the system. For example, using a table for large, complex logic can speed up queries, while a view is better for lightweight transformations.

Example:

  • View: Good for small models or when you’re testing things.
  • Table: Best for huge models you query often, faster to use.
  • Incremental: Perfect for huge databases.

Materializations help solve common data problems like slow performance or large data volumes. Learn how they’re used in real projects in these dbt use cases.

12. Define Metrics and Dimensions with the dbt Semantic Layer

It’s easy to add domain-specific metadata using a semantic layer in dbt. As you add more or less keywords to your dbt objects, that metadata will be added automatically to the semantic layer of your dbt object.

Without a shared definition, everyone might calculate the same metric (like revenue or active users) differently, which leads to confusion.

The semantic layer solves this by giving your team one clear definition for each metric, so everyone’s on the same page, no more arguments over numbers.

Example

You can define a metric like total_revenue once in dbt, and then use it across all your tools. No need to redefine it each time.

Set dimensions like customer_segment to continuously slice your metrics in any given place.

Conclusion

By following these dbt best practices, you can make sure your dbt projects are organized, efficient, and scalable. Each of these practices will help you to improve the quality and maintainability of your data models.

These principles will help you easily manage new projects, align your teams, and develop solid data pipelines that ultimately deliver correct insights.

Ready to streamline your data transformation process? Try Hevo Transformer today to enhance your dbt workflows and accelerate data processing!

Frequently Asked Questions about dbt Best Practices

1. What is dbt testing?

dbt testing involves running checks like ensuring primary keys are unique or values are not null, ensuring a reliable and accurate data pipeline.

2. What testing strategies should I follow in dbt?

Common testing strategies in dbt include:
Schema tests: Test constraints like uniqueness or non-null values.
Data tests: Validate business logic and custom rules.
Custom tests: Create specific tests tailored to your data’s needs.

3. Why should you codify your best practices in dbt?

Codifying best practices ensures consistency across projects, reduces errors, and makes scaling and maintaining the project easier.

4. How do I implement dbt infrastructure on a big team?

For large teams, organizing your project with layers (e.g., staging, intermediate, and marts) is key to avoiding conflicts. Also, use Git for version control and stick to naming conventions.

Srujana Maddula
Technical Content Writer

Srujana is a seasoned technical content writer with over 3 years of experience. She specializes in data integration and analysis and has worked as a data scientist at Target. Using her skills, she develops thoroughly researched content that uncovers insights and offers actionable solutions to help organizations navigate and excel in the complex data landscape.