Getting Started with dbt CTEs

If you are a SQL developer, there might have been times when you’ve dealt with huge SQL scripts, hundreds of lines full of nested subqueries, and wondered, “What is going on here?” It can hurt your brain trying to trace where the data is coming from. If you’ve been there, you’re not alone.

That’s exactly where CTEs (Common Table Expressions) come in. They let you break your SQL logic into bite-sized, understandable chunks. When you combine the power of CTEs with dbt (Data Build Tool), one of the most loved tools in modern data workflows, you get clean, modular, and easy-to-maintain transformation logic.

Thus, in this blog, let us explore how CTEs, when combined with a dbt workflow, referred to as a dbt cte approach, can make data modeling more elegant and well-structured, making more sense to you and the people you work with.

Table of Contents

What is dbt?

But before diving into the technicalities, let’s do a quick refresher to make sure we’re on the same page about what dbt actually is.

dbt stands for Data Build Tool, and at its core, it’s a framework that helps you transform data inside your data warehouse using SQL. It’s not an ETL tool in the traditional sense—it doesn’t extract or load data. Instead, it picks up right where your raw data lands (think: Snowflake, BigQuery, Redshift, etc.) and transforms it into something analytics-ready.

But you know what the coolest part is? dbt brings software engineering best practices to analytics work:

You write SQL models and save them as SQL scripts.
You can version control your transformations using any Git-based technology.
You can test your data, document it, and even generate a whole landing page with that documentation.
It encourages modular, maintainable data logic through features like ref(), sources, and yes, also CTEs.

So, instead of writing one massive SQL script and running it in a warehouse console, dbt lets you break it down into smaller, reusable, testable models. When you run dbt run, it compiles all your SQL, resolves dependencies, and runs the transformations in the correct order.

What is a CTE (Common Table Expression)?

Alright, let’s demystify the concept of a CTE. If you’ve ever written SQL that felt like it needed a whiteboard and three cups of coffee to understand, CTEs are about to become your new best friend.

CTE stands for Common Table Expression, which is really just a fancy term for a temporary, named result set in SQL that you can refer to later in your query.

Here’s what a CTE looks like:

WITH cleaned_orders AS (
    SELECT * FROM orders WHERE status IS NOT NULL
)
SELECT * FROM cleaned_orders;

Easy, right?

The WITH clause lets you define one or more subqueries (CTEs), each with a name (cleaned_orders in this case). Then, in your main query (after the CTEs), you can treat that CTE like it’s a regular table.

But, Why Use CTEs?

Readability: Break complex queries into logical steps.
Maintainability: Easier to troubleshoot and update just one piece of logic.
Modularity: Focus on one step at a time, like functions in programming.
Debugging: Test individual CTEs independently before combining them.

Think of CTEs like Lego blocks for SQL, you build something simple, stack it up, and before you know it, you’ve got a well-structured data transformation instead of a spaghetti mess of SQL.

Why CTEs Are Essential in dbt Models

Now that we know what CTEs are, let’s talk about why they’re such a natural fit in dbt projects. If you’ve ever worked on a data model that evolved over time (which is pretty much every data model), you know how quickly things can get messy. dbt and CTEs together help keep things less chaotic.

You know how?

Bringing Clarity to the logic

When writing dbt models, you often build up logic over several layers, like cleaning raw data, joining with other tables, filtering out test data, calculating metrics, etc.

CTEs let you represent each of those steps clearly. Instead of writing one massive SELECT with nested subqueries, you can stack your logic layer by layer. It’s like writing a recipe: one instruction per line.

Easy Collaboration

Let’s say you’re working with a team, and your teammate picks up your dbt model. If you’ve used clear, named CTEs like filtered_orders, active_customers, or revenue_by_customer, they’ll understand your thought process immediately.

Good CTE naming is like writing clean code that communicates intent.

Easy Debugging

Ever tried debugging a query with 5 nested SELECTs? With CTEs, you can comment out parts, run individual steps, or even extract a CTE into a separate dbt model temporarily to test it in isolation.

This makes your debugging workflow faster and way less frustrating.

Modularity

dbt encourages splitting your logic across multiple models using things like:

ref('model_name') for dependency tracking
source() for raw data
Ephemeral models for logic reuse

CTEs fit right into this modular mindset. They keep your logic tidy within a single model, just like ref() keeps things tidy across models.

Testing and Documentation

Want to test the output of a transformation step? Just pull that logic into a CTE, and boom, you can wrap a test around the final model, knowing your building blocks are solid.

Want to document how your revenue model works? CTEs give you natural breakpoints to add descriptions and context.

dbt CTE in Action: A Hands-On Example

Alright, let’s bring all this theory to life with a real-world example. Imagine you’re working with an online retail dataset. You’ve got:

A table of orders
A table of order items
You want to build a dbt model that calculates total revenue per customer

Let us walk through it together step by step, using CTEs to make the logic crystal clear.

Step 1: Clean the Orders Data

Let’s start by filtering out any orders that have missing values or weird statuses.

WITH cleaned_orders AS (
    SELECT
        order_id,
        customer_id,
        status,
        order_date
    FROM {{ ref('raw_orders') }}
    WHERE status = 'completed'
)

This CTE will set the foundation and filter out only complete, usable orders going forward.

Step 2: Bring in the Order Items

Next, we pull in the items that were part of each order.

order_items AS (
    SELECT
        order_id,
        item_price,
        quantity
    FROM {{ ref('raw_order_items') }}
)

This CTE keeps raw item-level detail separate and reusable.

Step 3: Join Orders with Items

Now we combine the cleaned orders with their items to calculate revenue.

joined_data AS (
    SELECT
        o.customer_id,
        i.item_price,
        i.quantity
    FROM cleaned_orders o
    JOIN order_items i ON o.order_id = i.order_id
)

This CTE is our core logic: cleaned, filtered, and joined. The final metric is then built upon this.

Step 4: Aggregate Revenue by Customer

Now we calculate the total revenue per customer in the final SELECT:

SELECT
    customer_id,
    SUM(item_price * quantity) AS total_revenue
FROM joined_data
GROUP BY customer_id

This is your model’s output. It’s clean, readable, and focused on business requirements.

What Makes This Great?

Every transformation is in a separate CTE.
You can test any step independently just by commenting out the rest.
It’s self-explanatory, no more guessing what a nested subquery is doing.
It fits naturally into a dbt model file (models/revenue_by_customer.sql), and will work with dbt run just like that.

Best Practices for Writing dbt Models Using CTEs

After understanding the concept and its ecosystem, it’s equally important to know how you use them. It makes a big difference. Here are some industry-vetted best practices to help you write clean, scalable, and lovable dbt models using CTEs.

Use Descriptive and Purposeful CTE Names

Don’t name your CTEs things like cte1, temp, or x. Give them names that reflect what they actually represent.

Example:

WITH valid_orders AS (
    ...
),
items_with_prices AS (
    ...
)

Think of CTE names like variable names in code: the more descriptive, the easier it is to understand what’s going on.

One Transformation Step per CTE

Try to keep each CTE focused on a single logical step in your transformation. If a CTE is doing multiple things – cleaning, joining, and calculating, you might want to break it into smaller pieces. This helps with:

Debugging
Testing logic
Reading the model top-to-bottom like a story

Keep Your Final SELECT Focused

The final SELECT in your model should ideally be just the output logic, not more joins, filters, or transformations. If your final block looks cluttered, move those steps into earlier CTEs.

Rule of thumb: The final SELECT should feel like a “summary” of the work above.

Use Comments to Explain Complex Logic

Even if your CTE names are descriptive, it never hurts to add a quick comment explaining why a particular step exists.

-- Exclude test orders and filter for completed status
WITH filtered_orders AS (
    SELECT * FROM ...
)

A small comment now can save someone else (or yourself) 30 minutes later.

Don’t Repeat Yourself (DRY)

If you’re using the same logic in multiple places across models, consider abstracting that into:

A shared dbt model (use ref())
Or a macro if it’s reusable logic with parameters

CTEs are great, but you don’t want to copy-paste logic over and over. dbt’s modular design helps avoid that.

Conclusion

In conclusion, mastering CTEs within dbt models isn’t just about writing cleaner SQL. It’s about adopting a mindset of modularity, clarity, and collaboration. Whether you’re transforming raw data, building metrics, or scaling a data pipeline across teams, using CTEs the right way will help you write models that are easier to understand, maintain, and trust.

By applying the best practices and avoiding common pitfalls we’ve discussed, you’ll be well on your way to becoming not just a dbt user, but a thoughtful, strategic data modeler. Keep building, keep learning, and let your SQL tell a story that’s clear to everyone who reads it.

Want to take your dbt transformations to the next level? Try Hevo Transformer — our powerful new feature that integrates seamlessly with dbt Core, enabling you to build, manage, and orchestrate transformations right inside your data warehouse with ease.

Frequently Asked Questions (FAQs)

1. Can I use multiple CTEs in a single dbt model?

Absolutely! You can stack as many CTEs as you need, as long as each one builds on the logic of the previous ones.

2. What’s the difference between a CTE and a subquery?

Functionally, both do similar things. But CTEs are named and defined at the top of your query, making them easier to read and reuse.

3. Should I always use CTEs in dbt models?

Not always, but in most cases, yes. CTEs make your models easier to maintain and explain, especially when your logic involves multiple steps.

Raju Mandal Senior Data Engineer

Raju is a Certified Data Engineer and Data Science & Analytics Specialist with over 8 years of experience in the technical field and 5 years in the data industry. He excels in providing end-to-end data solutions, from extraction and modeling to deploying dynamic data pipelines and dashboards. His enthusiasm for data architecture and visualization motivates him to create informative technical content that simplifies complicated concepts for data practitioners and business leaders.