Data Build Tool (dbt) is an effective analytics engineering tool that enables data teams to restructure raw data in their warehouse into well-organized, analysis-ready datasets. Tags are one of the handy features of dbt, which assist in organizing, filtering, and running models selectively. This article is an in-depth look at dbt tags, their real-world uses, and examples.

In this article, we’ll go in-depth on dbt tags, what they are, how to utilize them, best practices, and practical applications. You’re either a beginner or taking your dbt project to a hundred models, and no matter what level you’re at, learning tags will save you time and give structure to your process.

What Are dbt Tags?

Tags in dbt are metadata annotations that are applied to models, tests, sources, snapshots, seeds, and other dbt resources. Tags enable the grouping of models and offer flexibility in execution.

Tags are generally declared in a model’s.yml file in the models directory. Tags enable users to run or skip certain subsets of models depending on logical groupings.

Example of assigning a tag to a model in models/schema.yml:

Models:
 - name: customer_orders
   description: "Aggregated orders for each customer"
   tags: ['finance', 'monthly_report']

In this example, the customer_orders model is tagged with finance and monthly_report, such that dbt commands can run models selectively on these tags.

Why Use dbt Tags?

  • Improved Organization: Organize models by business domains, teams, or dependencies (finance, sales).
  • Selective Execution: Execute only the models pertinent to a certain task.
  • Optimized Performance: Prevent running unnecessary models, conserving compute resources.
  • Granular Testing & Debugging: Execute tests sequentially for a grouped tag.
  • Effective Deployment: Deploy only a subset of models to production while retaining others in development.
  • Environment Targeting: Employ tags such as dev_only or prod_ready.

How to Add Tags in dbt

Tags are included through the {{ config() }} function at the beginning of your model file:

{{ config(materialized="table",
  tags=["sales", "important"]
) }}

You can also add tags to tests in your .yml files:

version: 2
Models:
 - name: customers
   Columns:
   - name: customer_id
     tests:
      - unique:
          tags: ["core", "critical"]

How to Use dbt Tags?

1. Running Models by Specific Tags

You can execute only the models with a particular tag using the --select flag.

dbt run --select tag:finance

This command executes all models with the finance tag.

2. Excluding Models Based on Tags

To exclude models by a particular tag, use the --exclude flag.

dbt run --exclude tag:monthly_report

This executes all models except models tagged as monthly_report.

3. Combining Multiple Tags

To select models that meet any of multiple tags, join them with spaces:

dbt run --select tag:finance tag:marketing

This will run models labeled as either finance or marketing.

4. Using Tags in Tests

You can also use tags when running tests in dbt:

dbt test --select tag:data_quality

This will run all tests relating to models that have the tag data_quality.

5. Using Tags in Documentation Generation

When you are generating dbt documentation, you can tag-filter by:

dbt docs generate --select tag:critical

This is useful for generating documentation for the most important models only.

Real-World Use Cases

Case 1: Department-Based Tagging

Suppose you work in an organization with dedicated data teams for marketing, sales, and finance. You can tag your models as follows:

{{ config(tags=["marketing"]) }}

Then run only what you need:

dbt run --select tag:marketing

Case 2: Run Only High-Priority Models in Production

You may want to run only your priority models when it is busiest:

{{ config(tags=["high_priority", "prod_ready"]) }}

dbt run --select tag:high_priority tag:prod_ready

Case 3: CI/CD Testing

In CI environments, you may only wish to test models with the tag critical:

dbt test --select tag:critical

Advanced Use Cases of dbt Tags

1. Tagging Models by Environment

If you have multiple environments, such as dev, staging, and prod, you can tag models as follows:

Models:
  - name: sales_forecast
    tags: ['staging']

Then execute only staging models:

dbt run --select tag:staging

2. Tagging Models by Business Unit

If various teams own various parts of the warehouse, tags can be useful:

Models:
  - name: customer_revenue
    tags: ['marketing', 'revenue_analysis']

Now, marketing teams can run only their models:

dbt run --select tag:marketing

3. Tagging Models for Incremental Loads

You may want to distinguish between full-refresh and incremental models:

Models:
 - name: transactions
   tags: ['incremental']

Execute only incremental models:

dbt run --select tag:incremental

Best Practices for Applying dbt Tags

  • Follow a Consistent Tagging Convention: Normalize tags across projects, e.g., team_analytics, business_kpi, etc.
  • Limit Tags per Model: Don’t tag too much to keep things clear.
  • Document Your Use of Tags: Maintain a reference in a README or internal doc.
  • Use Tags with dbt Artifact: Take advantage of metadata from dbt docs to inspect tag usage.
  • Review Tags Regularly: Delete unused or unnecessary tags to maintain configurations tidy.

Common Mistakes to Avoid

  • Cognitive tags introduce dependencies: Tags do not impact the DAG (dependency graph).
  • Tagging in a non-consistent manner: Applying ‘finance’ in one model and’ model finance’ in another is incohesive.
  • Not using CLI filters: Tags are most effective when used with --select and --exclude.

Conclusion

dbt tags offer an effective means of organizing, running, and handling models in a streamlined manner. Whether selective models are being run, testing is being streamlined, or deployments are being structured, tags bring the flexibility needed to automate workflows. With the adoption of best practices and using cutting-edge tagging approaches, teams can achieve improved model governance and maximize their dbt projects efficiently.

By using dbt tags judiciously, data teams are able to make their data transformation pipelines more scalable and maintainable, which can result in enhanced data governance and operational efficiency.

Ready to dive further? Check out dbt’s documentation on selectors and model selection syntax.

Want to take your dbt transformations to the next level? Try Hevo Transformer — built on dbt Core to simplify, schedule, and scale your transformations with ease.

Enjoy tagging! 

FAQs

1. Can I tag one model with multiple tags?

Yes, dbt supports applying multiple tags to a single model, allowing flexible execution and more appropriate grouping for selective runs, testing, or doc generation.

2. Do tags have any influence on model dependencies in dbt?

No, tags have no effect on the DAG or dependency graph. They are purely metadata that serve for organization, filtering, and control of execution, not for specifying model relationships.

3. Where can I set tags in dbt?

Tags can be specified in your model’s.yml file or in the model file itself with the {{ config() }} block at the beginning of the SQL file.

4. Can I use tags in dbt tests and docs?

Yes, you can tag tests and apply them with dbt test and dbt docs generate to run validations selectively or generate documentation for tagged models selectively.

      Sarang Ravate
      Senior Software Engineer

      Sarang is a skilled Data Engineer with over 5 years of experience, blending his expertise in technology with a passion for design and entrepreneurship. He thrives at the intersection of these fields, driving innovation and crafting solutions that seamlessly integrate data engineering with creative thinking.