dbt has revolutionized data transformation in many ways. It introduced engineering practices to the analytics world. It offers commands like dbt init, dbt compile, dbt run, dbt seed, and others. One such important dbt command is dbt build. It simplifies the process of building, testing, and deploying data models. This is particularly beneficial for optimizing your dbt models for production environments.
Let’s explore how you can use the dbt build command. We’ll also see the use cases where you can run dbt build along with examples that differentiate dbt build and dbt run commands.
Table of Contents
What is dbt build command?
Dbt build assesses all the dependencies between your models, tests, and seeds before running them. Essentially, it combines dbt run, dbt test, dbt snapshot, and dbt seed into a single operation.
This ensures that your models are correctly built, your tests validate the integrity of your data, and your snapshots reflect the latest updates. This makes it essential for maintaining efficiency and data accuracy in production workflows.
Why dbt build?
- Runs Everything at Once – It executes models, tests, and snapshots in a single command.
- Optimized for Incremental Models – Instead of reloading all data, it only updates what’s new.
- Understands Dependencies – It ensures that models run in the right order based on dependencies.
- Better Error Handling – If a model fails, you can retry just that model instead of restarting the whole pipeline. Also, if you have a model A that is dependent on model B, if the test fails on model B, the dependent model A will not be built.
- Saves Time – Eliminates redundant steps and speeds up development.
How to use the dbt build command?
Here is what I got when I ran the dbt build in my project directory through the command prompt.
Running with dbt=1.5.0
Found 4 models, 6 tests, 1 snapshot, 1 seed file, 3 sources
18:30:21 | Concurrency: 2 threads (target='dev')
18:30:21 |
18:30:21 | 1 of 8 START seed file analytics.customer_spending_data............... [RUN]
18:30:22 | 1 of 8 OK loaded seed file analytics.customer_spending_data........... [INSERT 500 in 0.09s]
18:30:22 | 2 of 8 START view model analytics.monthly_spending_trends............ [RUN]
18:30:22 | 2 of 8 OK created view model analytics.monthly_spending_trends....... [CREATE VIEW in 0.15s]
18:30:22 | 3 of 8 START model analytics.top_customers........................... [RUN]
18:30:22 | 3 of 8 OK created table model analytics.top_customers................ [CREATE TABLE in 0.18s]
18:30:22 | 4 of 8 START test not_null_monthly_spending_trends_customer_id....... [RUN]
18:30:22 | 4 of 8 PASS not_null_monthly_spending_trends_customer_id............. [PASS in 0.05s]
18:30:22 | 5 of 8 START test unique_top_customers_customer_id.................. [RUN]
18:30:22 | 5 of 8 PASS unique_top_customers_customer_id......................... [PASS in 0.04s]
18:30:22 | 6 of 8 START snapshot analytics.customer_spending_snapshot........... [RUN]
18:30:22 | 6 of 8 OK snapshotted analytics.customer_spending_snapshot........... [INSERT 30 in 0.20s]
18:30:22 | 7 of 8 START test relationships_monthly_spending_trends_transactions. [RUN]
18:30:22 | 7 of 8 PASS relationships_monthly_spending_trends_transactions....... [PASS in 0.06s]
18:30:22 | 8 of 8 START model analytics.daily_spending_summary.................. [RUN]
18:30:22 | 8 of 8 OK created table model analytics.daily_spending_summary....... [CREATE TABLE in 0.22s]
18:30:22 |
18:30:22 | Finished running 1 seed, 3 models, 3 tests, 1 snapshot in 1.15s.
Completed successfully
Done. PASS=8 WARN=0 ERROR=0 SKIP=0 TOTAL=8
Do you want to run only on a specific model?
Syntax: dbt build --select <model_name>
Do you want to exclude a model?
Syntax: dbt build --exclude <model_name>
dbt build vs dbt run
The dbt build and dbt run aren’t the same. Their functionalities are subtly different. Let me walk you through a simple example to show the differences. Learn the difference between dbt run vs dbt build to know more.
Imagine you’re analyzing spending patterns of customer transactions to identify loyal, long-term customers.
Step 1: Create a dbt model to analyze the total spending of a customer
WITH transactions AS (
SELECT
customer_id,
SUM(amount) AS total_spent
FROM {{ ref('transactions') }}
GROUP BY customer_id
)
SELECT
customer_id,
total_spent,
CASE
WHEN total_spent > 1000 THEN 'high spender'
ELSE 'regular'
END AS spender_category
FROM transactions
You can either execute the dbt run or dbt build command, depending on your use case. Let’s explore both scenarios.
Step 2: Execute dbt run
dbt run --select high_spenders
This command will execute the high_spenders model. However, we aren’t sure the data we have is clean and accurate. So, we’ll add some tests to our model.
Step 3: Let’s add a dbt test on the total_spent column to make sure its values are positive.
Yaml file:
models:
- name: high_spenders
tests:
- dbt_expectations.expect_column_values_to_be_between:
column_name: total_spent
min_value: 0
Step 4: Run dbt build
dbt build --select high_spenders
We know that: dbt build = dbt run+ dbt test+ dbt snapshot+ dbt seed
In a nutshell, when we run dbt build, the following happens:
- The
high_spenders.sql
model gets executed. - Runs tests defined on the model
high_spenders_test.yml
to validate data quality - Takes a snapshot of customer_snapshot.sql to track the changes.
- JSON files are generated. Manifest.json contains all the metadata of the project and
run_results.json
contains the detailed execution results
Comparison table: dbt build vs dbt run
Feature | dbt run | dbt build |
Runs models | Yes | Yes |
Runs tests | No | Yes |
Tracks snapshots | No | Yes |
Recommended for development | Yes | No |
Recommended for production | No | Yes |
Key Takeaway
That was a lot to go over, but hopefully you’re now much better informed about dbt build and the awesome ways you can use it. The best way to grasp it better is by implementing it in your data transformation project, so go ahead.
In a nutshell, these are the key takeaways from this article:
- dbt build is a production command because it tests the model before executing it to minimize data quality issues.
- During schema change in the incremental model, you can configure the
full-refresh
flag to reprocess the data transformation. - Make use of flags like
--select
and--exclude
to run targeted models, tests, seeds, and snapshots. - Monitor execution logs regularly and look out for foreseen errors to prevent failed executions.
FAQs
1. What does dbt build do?
The dbt build command does a lot. It runs your models, tests them, captures snapshots to track changes, and loads csv files into tables. This simple command effectively builds and tests your entire dbt project in one go.
2. How is dbt build different from dbt run?
The dbt just executes models, while the dbt build does everything: runs, tests, snapshots, and seeds.
3. Can I run dbt build on specific models?
Yes, you can use –select flag for it. Syntax: dbt build –select <model_name>
4. Does dbt build run incremental models?
Yes, the dbt build will only process new data instead of rebuilding the entire model.