Businesses today are highly dependent on their internal data when it comes to making conscious decisions. Thus, having reliable data along with a scalable and maintainable data transformation workflow is a must-have. Data Build Tool (dbt) – an open-source SQL-based framework – brings just that to the table. It empowers data teams to transform, test, and document data directly in their warehouse.

Data Build Tool (dbt) is specifically built for the transformation part of the ETL (Extract, Transform & Load) workflow of any data pipeline. It runs on top of data warehouse services like Snowflake, BigQuery, and Redshift, leveraging their capabilities to ensure reliability and maintainability. With dbt, analytics engineers can build modular, version-controlled data pipelines using simple SQL, making it a cornerstone of the modern data stack.

Today in this blog, let us explore everything you need to know about dbt — from core concepts to advanced techniques, including how to use the dbt run command effectively.

What is Data Build Tool (dbt)?

Data Build Tool (dbt) is an open-source data transformation framework, helping data engineers and analysts build modular, reliable, and highly maintainable data transformation workflows. Its SQL-first approach makes it popular among data professionals. It enables data teams to:

    • Transform raw data into analytics-ready datasets
    • Test data quality with built-in assertions
    • Document models for better collaboration
    • Deploy changes with version control (Git)

    Primarily, dbt follows two key principles:

    • Introducing software engineering principles of managing and maintaining code to the data transformation workflow
    • Leveraging data warehouses like Redshift, BigQuery, Snowflake, etc., to run the transformation job for the compute layer

    How Does dbt Fit in the Modern-Day Data Stack?

    Data Build Tool (dbt) sits between your data warehouse and BI tools, transforming raw data into clean, modeled datasets:

    Data Sources → (Ingestion via Hevo) → Data Warehouse → (Transformation via Hevo Transformer[Powered by dbt Core]) → BI (Looker/Tableau)

    Key Features of dbt

    Let us have a quick glance at the key features of dbt:

    FeatureDescription
    SQL-based transformationsWrite transformations in SQL, no Spark/Python needed
    Dependency managementAutomatic DAG (Directed Acyclic Graph) generation
    Modular data modelingBuild reusable models (staging → marts)
    Data testingValidate quality with not_null, unique tests
    DocumentationAuto-generate docs with dbt docs generate
    Incremental modelsOnly process new data for efficiency
    Macros (Jinja templating)Reusable SQL snippets (e.g., date_trunc helpers)

    Core Components of dbt

    A. Projects & File Structure

    A dbt project is organized as:

    my_project/  
    
    ├── dbt_project.yml       # Config file  
    
    ├── models/               # SQL models  
    
    │   ├── staging/          # Raw data cleaning  
    
    │   ├── intermediate/     # Business logic  
    
    │   └── marts/            # Final datasets  
    
    ├── seeds/                # CSV reference data  
    
    ├── tests/                # Data quality tests  
    
    └── macros/               # Reusable SQL functions

    B. Models (The Building Blocks)

    Models are SQL files that define transformations:

    -- models/staging/stg_orders.sql  
    
    {{ config(materialized='view') }}  
    
    SELECT  
    
        id AS order_id,  
    
        customer_id,  
    
        amount  
    
    FROM {{ source('shop', 'raw_orders') }}

    C. Materializations (How Models Are Built)

    dbt supports different materialization strategies:

    TypeDescriptionUse Case
    TableCreates a physical tableFinal datasets
    ViewCreates a virtual viewLightweight transformations
    IncrementalOnly updates new dataLarge fact tables
    EphemeralCTE (no database object)Intermediate logic

    Configure in SQL:

    {{ config(materialized='table|view|incremental|ephemeral') }}  

    D. Sources, Ref, and Dependencies

    • {{ source() }} – References raw tables
    • {{ ref() }} – Links to other dbt models

    Example:

    -- models/marts/orders.sql  
    
    SELECT * FROM {{ ref('stg_orders') }}

    Data Build Tool (dbt) Command Line Utility

    The command-line utility offered by dbt is your one-stop window to run, test, and execute dbt workflows. Each dbt command serves a specific purpose in the data transformation workflow. Let us explore all major dbt commands with detailed explanations of their functionality, common use cases, and practical examples.

    dbt run – Execute Your Data Models

    Purpose: The primary command for executing your dbt models in the correct dependency order.

    Key Functionality:

    • Compiles SQL models into executable queries
    • Determines execution order using the DAG (Directed Acyclic Graph)
    • Handles incremental model updates
    • Manages database transactions

    Common Usage Examples:

    # Basic model execution
    
    dbt run
    
    # Run specific models with their dependencies
    
    dbt run --select +my_model+
    
    # Force full refresh of incremental models
    
    dbt run --full-refresh
    
    # Run with parallel threads for performance
    
    dbt run --threads 8

    When to Use:

    • During development to test model changes
    • In production pipelines to update datasets
    • When you need to rebuild specific portions of your DAG

      Advanced User Tip: Combine with selection syntax to precisely control which models execute.

      Dive Deeper into the dbt run Command

      The dbt run command is the workhorse of dbt (Data Build Tool), responsible for executing your data transformation models in the correct dependency order. This section provides an in-depth look at the four powerful control mechanisms that give you granular control over model execution:

      • Arguments – Select specific models to run
      • Flags – Boolean switches that modify behavior
      • Options – Key-value configurations
      • Parameters – Runtime variables for dynamic SQL

      Understanding these components will help you optimize performance, debug efficiently, and customize runs for different environments.

      dbt run Arguments: Precision Model Selection

      Arguments control which models get executed through flexible selection syntax.

      Argument PatternDescriptionExample
      –select model_nameRun single modeldbt run –select orders
      –select +model_name+Model with all dependenciesdbt run –select +orders+
      –select @model_nameModel and all downstreamdbt run –select @orders
      –select tag:tag_nameTag-based selectiondbt run –select tag:daily
      –select path/to/modelsDirectory selectiondbt run –select models/marts
      –select config.materialized:tableMaterialization filterdbt run –select config.materialized:incremental
      –exclude model_nameExclusion patterndbt run –exclude slow_model

      Advanced Selection Examples:

      # Multiple selection criteria
      
      dbt run --select "tag:hourly+,config.materialized:incremental"
      
      # State-based execution (CI/CD)
      
      dbt run --select state:modified
      
      dbt run --select state:new
      
      # Hybrid patterns
      
      dbt run --select "marts/sales/+,tag:production"
      
      # Combine + and tag
      
      dbt run --select +tag:critical+ --exclude tag:deprecated

      dbt run Flags: Execution Behavior Modifiers

      Flags are boolean switches (true/false) that alter how models run.

      FlagPurposeUse Case
      –full-refreshForces full rebuild of incremental modelsSchema changes, backfills
      –fail-fastStops execution on first failureCI pipelines, debugging
      –dry-runCompiles without executingChange validation
      –debugShows detailed debug logsTroubleshooting
      –quietSuppresses non-essential outputScripting/automation
      –no-version-checkSkips dbt version verificationAirflow/constrained environments

      Flag Combinations:

      # Safe execution pattern
      
      dbt run --full-refresh --fail-fast --threads 4
      
      # Debugging workflow
      
      dbt run --select problem_model --debug --fail-fast

      Note: Flags can be combined with any selection arguments.

      dbt run Options: Configurable Execution Parameters

      Options accept values to fine-tune execution.

      OptionValuesDescriptionExample
      –threadsIntegerParallel execution threads–threads 8
      –targetProfile targetEnvironment selection–target prod
      –varsJSON/YAMLRuntime variables–vars ‘{“key”:”value”}’
      –statePathArtifacts for state comparison–state ./prod_artifacts
      –deferUse other environment’s references–defer –state …
      –profile-dirPathAlternate profiles directory–profile-dir ./secure

      Option Examples:

      # Production deployment with variables
      
      dbt run --target prod --vars '{"run_date":"2023-01-01"}'
      
      # State comparison in CI
      
      dbt run --select state:modified --state ./prod_artifacts
      
      # High-parallelism run
      
      dbt run --threads 16 --target dev

      dbt run Parameters: Dynamic SQL Variables

      Parameters (variables) enable runtime customization of SQL.

      Implementation Guide:

      1. Define variables in models:
      -- models/orders.sql
      
      SELECT * FROM {{ ref('stg_orders') }}
      WHERE order_date >= '{{ var("start_date") }}'
      1. Pass values via CLI:
      dbt run --vars '{"start_date":"2023-01-01"}'
      1. Defaults in dbt_project.yml:
      vars:
      
        start_date: "2020-01-01"

      Advanced Parameter Techniques:

      # Multi-variable input
      
      dbt run --vars '{"start_date":"2023-01-01", "end_date":"2023-12-31"}'
      
      # YAML format for complex values
      
      dbt run --vars "{key: value, nested: {key2: value2}}"

      Execution Strategy Cheat Sheet

      ScenarioRecommended Command
      Development testingdbt run --select +my_model
      Production refreshdbt run --target prod --full-refresh --threads 8
      CI pipelinedbt run --select state:modified --fail-fast
      Debuggingdbt run --select problem_model --debug
      Backfilldbt run --select +incremental+ --full-refresh --vars '{"backfill":true}'

      Best Practices

      • Start small: Test with --select before performing full runs
      • Use threads wisely: 2-4x your warehouse clusters
      • Tag strategically: Organize models by refresh frequency
      • Leverage state: Use --state for efficient CI/CD
      • Document variables: Maintain a VARIABLES.md markdown file for reference

      dbt test – Validate Data Quality

      Purpose: Execute data tests to ensure quality and consistency.

      Key Functionality:

      • Runs both schema (YAML-defined) and data (SQL) tests
      • Can store test failures for analysis
      • Integrates with the DAG for dependency-aware testing

      Common Usage Examples:

      # Run all tests
      
      dbt test
      
      # Test specific models
      
      dbt test --select my_model
      
      # Only run schema tests
      
      dbt test --schema
      
      # Store test failures in the database
      
      dbt test --store-failures

      When to Use:

      • As part of CI/CD pipelines
      • After loading new data
      • Before important reports or analyses

      Advanced User Tip: Use --store-failures to persist test results for trend analysis.

      dbt seed – Load Reference Data

      Purpose: Manage static reference data that needs version control.

      Key Functionality:

      • Loads CSV files from the seeds directory
      • Creates or updates tables in your database
      • Supports full refreshes when needed

      Common Usage Examples:

      # Load all seed files
      
      dbt seed
      
      # Load specific seed files
      
      dbt seed --select country_codes
      
      # Force recreation of seed tables
      
      dbt seed --full-refresh

      When to Use:

      • Loading country/region codes
      • Product catalogs
      • Reference mappings
      • Any static business logic

      Advanced User Tip: Keep seed files small (<10MB) as they’re loaded via your local machine.

      Conclusion

      Mastering the dbt run command and its arguments, flags, options, and parameters provides fine-grained control over your data transformation pipelines. With model selection syntax (like dbt run --select model_name), you can execute chosen portions of your DAG with surgical precision. Flags like --full-refresh and --fail-fast manage runtime behavior, while options such as --threads and --target help optimize performance across different environments. Parameters (--vars) enable dynamic SQL generation, making your models adaptable to changing needs.

      Whether you’re debugging, backfilling data, or deploying to production, understanding these elements ensures your pipelines are effective, resilient, and sustainable.

      Want to transform your warehouse data effortlessly? Try Hevo Transformer — a powerful dbt Core integration to design and orchestrate transformations with ease.

      Frequently Asked Questions (FAQs)

      1. How to speed up dbt run?

      Increase --threads, use incremental models, and limit model selection.

      2. When to use --full-refresh?

      Use it for rebuilding incremental models after schema changes or backfills.

      3. How to run only modified models in CI/CD?

      Use dbt run --select state:modified --defer --state ./prod-artifacts.

      4. Best way to pass dynamic values?

      Use --vars '{"key":"value"}' and reference with {{ var("key") }} in SQL.

      Raju Mandal
      Senior Data Engineer

      Raju is a Certified Data Engineer and Data Science & Analytics Specialist with over 8 years of experience in the technical field and 5 years in the data industry. He excels in providing end-to-end data solutions, from extraction and modeling to deploying dynamic data pipelines and dashboards. His enthusiasm for data architecture and visualization motivates him to create informative technical content that simplifies complicated concepts for data practitioners and business leaders.