Businesses today are highly dependent on their internal data when it comes to making conscious decisions. Thus, having reliable data along with a scalable and maintainable data transformation workflow is a must-have. Data Build Tool (dbt) – an open-source SQL-based framework – brings just that to the table. It empowers data teams to transform, test, and document data directly in their warehouse.
Data Build Tool (dbt) is specifically built for the transformation part of the ETL (Extract, Transform & Load) workflow of any data pipeline. It runs on top of data warehouse services like Snowflake, BigQuery, and Redshift, leveraging their capabilities to ensure reliability and maintainability. With dbt, analytics engineers can build modular, version-controlled data pipelines using simple SQL, making it a cornerstone of the modern data stack.
Today in this blog, let us explore everything you need to know about dbt — from core concepts to advanced techniques, including how to use the dbt run command effectively.
Table of Contents
What is Data Build Tool (dbt)?
Data Build Tool (dbt) is an open-source data transformation framework, helping data engineers and analysts build modular, reliable, and highly maintainable data transformation workflows. Its SQL-first approach makes it popular among data professionals. It enables data teams to:
- Transform raw data into analytics-ready datasets
- Test data quality with built-in assertions
- Document models for better collaboration
- Deploy changes with version control (Git)
Primarily, dbt follows two key principles:
- Introducing software engineering principles of managing and maintaining code to the data transformation workflow
- Leveraging data warehouses like Redshift, BigQuery, Snowflake, etc., to run the transformation job for the compute layer
How Does dbt Fit in the Modern-Day Data Stack?
Data Build Tool (dbt) sits between your data warehouse and BI tools, transforming raw data into clean, modeled datasets:
Data Sources → (Ingestion via Hevo) → Data Warehouse → (Transformation via Hevo Transformer[Powered by dbt Core]) → BI (Looker/Tableau)
Key Features of dbt
Let us have a quick glance at the key features of dbt:
Feature | Description |
SQL-based transformations | Write transformations in SQL, no Spark/Python needed |
Dependency management | Automatic DAG (Directed Acyclic Graph) generation |
Modular data modeling | Build reusable models (staging → marts) |
Data testing | Validate quality with not_null, unique tests |
Documentation | Auto-generate docs with dbt docs generate |
Incremental models | Only process new data for efficiency |
Macros (Jinja templating) | Reusable SQL snippets (e.g., date_trunc helpers) |
Core Components of dbt
A. Projects & File Structure
A dbt project is organized as:
my_project/
├── dbt_project.yml # Config file
├── models/ # SQL models
│ ├── staging/ # Raw data cleaning
│ ├── intermediate/ # Business logic
│ └── marts/ # Final datasets
├── seeds/ # CSV reference data
├── tests/ # Data quality tests
└── macros/ # Reusable SQL functions
B. Models (The Building Blocks)
Models are SQL files that define transformations:
-- models/staging/stg_orders.sql
{{ config(materialized='view') }}
SELECT
id AS order_id,
customer_id,
amount
FROM {{ source('shop', 'raw_orders') }}
C. Materializations (How Models Are Built)
dbt supports different materialization strategies:
Type | Description | Use Case |
Table | Creates a physical table | Final datasets |
View | Creates a virtual view | Lightweight transformations |
Incremental | Only updates new data | Large fact tables |
Ephemeral | CTE (no database object) | Intermediate logic |
Configure in SQL:
{{ config(materialized='table|view|incremental|ephemeral') }}
D. Sources, Ref, and Dependencies
- {{ source() }} – References raw tables
- {{ ref() }} – Links to other dbt models
Example:
-- models/marts/orders.sql
SELECT * FROM {{ ref('stg_orders') }}
Data Build Tool (dbt) Command Line Utility
The command-line utility offered by dbt is your one-stop window to run, test, and execute dbt workflows. Each dbt command serves a specific purpose in the data transformation workflow. Let us explore all major dbt commands with detailed explanations of their functionality, common use cases, and practical examples.
dbt run – Execute Your Data Models
Purpose: The primary command for executing your dbt models in the correct dependency order.
Key Functionality:
- Compiles SQL models into executable queries
- Determines execution order using the DAG (Directed Acyclic Graph)
- Handles incremental model updates
- Manages database transactions
Common Usage Examples:
# Basic model execution
dbt run
# Run specific models with their dependencies
dbt run --select +my_model+
# Force full refresh of incremental models
dbt run --full-refresh
# Run with parallel threads for performance
dbt run --threads 8
When to Use:
- During development to test model changes
- In production pipelines to update datasets
- When you need to rebuild specific portions of your DAG
Advanced User Tip: Combine with selection syntax to precisely control which models execute.
Dive Deeper into the dbt run Command
The dbt run command is the workhorse of dbt (Data Build Tool), responsible for executing your data transformation models in the correct dependency order. This section provides an in-depth look at the four powerful control mechanisms that give you granular control over model execution:
- Arguments – Select specific models to run
- Flags – Boolean switches that modify behavior
- Options – Key-value configurations
- Parameters – Runtime variables for dynamic SQL
Understanding these components will help you optimize performance, debug efficiently, and customize runs for different environments.
dbt run Arguments: Precision Model Selection
Arguments control which models get executed through flexible selection syntax.
Argument Pattern | Description | Example |
–select model_name | Run single model | dbt run –select orders |
–select +model_name+ | Model with all dependencies | dbt run –select +orders+ |
–select @model_name | Model and all downstream | dbt run –select @orders |
–select tag:tag_name | Tag-based selection | dbt run –select tag:daily |
–select path/to/models | Directory selection | dbt run –select models/marts |
–select config.materialized:table | Materialization filter | dbt run –select config.materialized:incremental |
–exclude model_name | Exclusion pattern | dbt run –exclude slow_model |
Advanced Selection Examples:
# Multiple selection criteria
dbt run --select "tag:hourly+,config.materialized:incremental"
# State-based execution (CI/CD)
dbt run --select state:modified
dbt run --select state:new
# Hybrid patterns
dbt run --select "marts/sales/+,tag:production"
# Combine + and tag
dbt run --select +tag:critical+ --exclude tag:deprecated
dbt run Flags: Execution Behavior Modifiers
Flags are boolean switches (true/false) that alter how models run.
Flag | Purpose | Use Case |
–full-refresh | Forces full rebuild of incremental models | Schema changes, backfills |
–fail-fast | Stops execution on first failure | CI pipelines, debugging |
–dry-run | Compiles without executing | Change validation |
–debug | Shows detailed debug logs | Troubleshooting |
–quiet | Suppresses non-essential output | Scripting/automation |
–no-version-check | Skips dbt version verification | Airflow/constrained environments |
Flag Combinations:
# Safe execution pattern
dbt run --full-refresh --fail-fast --threads 4
# Debugging workflow
dbt run --select problem_model --debug --fail-fast
Note: Flags can be combined with any selection arguments.
dbt run Options: Configurable Execution Parameters
Options accept values to fine-tune execution.
Option | Values | Description | Example |
–threads | Integer | Parallel execution threads | –threads 8 |
–target | Profile target | Environment selection | –target prod |
–vars | JSON/YAML | Runtime variables | –vars ‘{“key”:”value”}’ |
–state | Path | Artifacts for state comparison | –state ./prod_artifacts |
–defer | – | Use other environment’s references | –defer –state … |
–profile-dir | Path | Alternate profiles directory | –profile-dir ./secure |
Option Examples:
# Production deployment with variables
dbt run --target prod --vars '{"run_date":"2023-01-01"}'
# State comparison in CI
dbt run --select state:modified --state ./prod_artifacts
# High-parallelism run
dbt run --threads 16 --target dev
dbt run Parameters: Dynamic SQL Variables
Parameters (variables) enable runtime customization of SQL.
Implementation Guide:
- Define variables in models:
-- models/orders.sql
SELECT * FROM {{ ref('stg_orders') }}
WHERE order_date >= '{{ var("start_date") }}'
- Pass values via CLI:
dbt run --vars '{"start_date":"2023-01-01"}'
- Defaults in dbt_project.yml:
vars:
start_date: "2020-01-01"
Advanced Parameter Techniques:
# Multi-variable input
dbt run --vars '{"start_date":"2023-01-01", "end_date":"2023-12-31"}'
# YAML format for complex values
dbt run --vars "{key: value, nested: {key2: value2}}"
Execution Strategy Cheat Sheet
Scenario | Recommended Command |
Development testing | dbt run --select +my_model |
Production refresh | dbt run --target prod --full-refresh --threads 8 |
CI pipeline | dbt run --select state:modified --fail-fast |
Debugging | dbt run --select problem_model --debug |
Backfill | dbt run --select +incremental+ --full-refresh --vars '{"backfill":true}' |
Best Practices
- Start small: Test with
--select
before performing full runs - Use threads wisely: 2-4x your warehouse clusters
- Tag strategically: Organize models by refresh frequency
- Leverage state: Use
--state
for efficient CI/CD - Document variables: Maintain a VARIABLES.md markdown file for reference
dbt test – Validate Data Quality
Purpose: Execute data tests to ensure quality and consistency.
Key Functionality:
- Runs both schema (YAML-defined) and data (SQL) tests
- Can store test failures for analysis
- Integrates with the DAG for dependency-aware testing
Common Usage Examples:
# Run all tests
dbt test
# Test specific models
dbt test --select my_model
# Only run schema tests
dbt test --schema
# Store test failures in the database
dbt test --store-failures
When to Use:
- As part of CI/CD pipelines
- After loading new data
- Before important reports or analyses
Advanced User Tip: Use --store-failures
to persist test results for trend analysis.
dbt seed – Load Reference Data
Purpose: Manage static reference data that needs version control.
Key Functionality:
- Loads CSV files from the seeds directory
- Creates or updates tables in your database
- Supports full refreshes when needed
Common Usage Examples:
# Load all seed files
dbt seed
# Load specific seed files
dbt seed --select country_codes
# Force recreation of seed tables
dbt seed --full-refresh
When to Use:
- Loading country/region codes
- Product catalogs
- Reference mappings
- Any static business logic
Advanced User Tip: Keep seed files small (<10MB) as they’re loaded via your local machine.
Conclusion
Mastering the dbt run
command and its arguments, flags, options, and parameters provides fine-grained control over your data transformation pipelines. With model selection syntax (like dbt run --select model_name
), you can execute chosen portions of your DAG with surgical precision. Flags like --full-refresh
and --fail-fast
manage runtime behavior, while options such as --threads
and --target
help optimize performance across different environments. Parameters (--vars
) enable dynamic SQL generation, making your models adaptable to changing needs.
Whether you’re debugging, backfilling data, or deploying to production, understanding these elements ensures your pipelines are effective, resilient, and sustainable.
Want to transform your warehouse data effortlessly? Try Hevo Transformer — a powerful dbt Core integration to design and orchestrate transformations with ease.
Frequently Asked Questions (FAQs)
1. How to speed up dbt run?
Increase --threads
, use incremental models, and limit model selection.
2. When to use --full-refresh
?
Use it for rebuilding incremental models after schema changes or backfills.
3. How to run only modified models in CI/CD?
Use dbt run --select state:modified --defer --state ./prod-artifacts
.
4. Best way to pass dynamic values?
Use --vars '{"key":"value"}'
and reference with {{ var("key") }}
in SQL.