Data engineers and analysts rely on DBT because it makes data transformation scalable, bridging the gap between business-driven analytics and engineering-heavy ETL. But what if you need to rebuild an entire table rather than update it incrementally?

This article talks about how you can use dbt full refresh to ensure your table is in sync with the source. We’ll also see where you’ll be using this. But before that, let’s understand what dbt is.

What is dbt?

Your business probably has data coming from multiple sources. You’ll need to find a way to combine these and extract valuable insights. The traditional way could be downloading and merging them into a single spreadsheet. But that’s time-consuming and prone to manual errors.

With modern data tools, it’s easier to extract and combine data from various sources into a cloud platform. Once the data is in the warehouse, dbt comes into play. The dbt runs directly within the warehouse. It’s a scalable data tool that allows you to clean and transform data to make it suitable for direct use in downstream applications like dashboards and BI reporting.

Full refresh in dbt

Imagine you are working as a data engineer at an e-commerce company and you are assigned to track customer purchases and their spending pattern. However, due to an error, customers were assigned the wrong transactions and membership level.

In this case, you’ll need to run the dbt full refresh command instead of the incremental model, as the latter only updates recent records—not historical ones.

Basically, when you need to completely rebuild a table—not just update recent records—you must run the following command:

dbt run --full-refresh

This approach ensures accuracy and consistency in transformations. Schema changes are applied to all records, making your model a replica of source data.

Configuring dbt full refresh

Let’s say you have a customer spending model. The following command drops and recreates the customer_spending table or view in your database.

dbt run --full-refresh --models customer_spending

But if you need to run this command on selective models, you can always set a flag and configure it accordingly. Let me walk you through that.

First, set up the configuration in your model that supports controlled transformations.

{{ config(

    materialized='incremental',

    unique_key='order_id',

    full_refresh=var('full_refresh', False)

) }}

Here we are using a runtime variable var() to switch between incremental and full refresh mode. By default, the full_refresh flag is set to false, meaning the model runs incrementally unless explicitly overridden. To force a full table rebuild, run the following command with full_refresh set to true.

dbt run --vars 'full_refresh: true' --select customer_spending

We’re setting a custom flag for full refresh, which will be used inside the model. This will apply only to the customer_spending table.

dbt Incremental Models vs. Full Refresh

If you understand full refresh, you’ll agree it ensures data accuracy. But it’s not always feasible—it’s slow and costly. Let’s briefly compare full refresh and incremental models.

Incremental model

  • Execution time is low as it processes and updates only recent records.
  • Computes faster. Cloud warehouses charge based on query compute time. So you save quite a few dollars on the business.
  • You can trigger incremental refreshes daily.

Full Refresh

  • Computing time is extremely high.
  • You will need to reload all the data, so a longer wait time.
  • You can schedule a full refresh weekly or monthly to maintain data quality.
  • It’s only ideal for large-scale corrections and schema changes.

Actions to Avoid

You should restrict developers from executing the full refresh command on specific models in your project. Because it would lead to dropping and recreating all the dependent models, causing hours of delay in performance. In some cases, this delay can last hours, blocking all queries until the table is rebuilt—slowing down pipelines.

Either disable the ability to run dbt full refresh on your model using a conditional code logic or add a flag that is default set to false and configure it in your model based on your requirement, which we have covered earlier.

Example

Let’s use the same example of customer spending patterns.

Step 1: Configure Full Refresh in SQL Model (.sql File)

{{ config(

    materialized='incremental',

    unique_key='customer_id',

    full_refresh=true  -- flag set to false/true so that full refresh is controlled when dbt runs

) }}

SELECT

    customer_id,

    SUM(order_total) AS total_spent,

    MAX(order_date) AS last_purchase_date

FROM raw.orders

WHERE order_date > (SELECT MAX(order_date) FROM {{ this }}) 

GROUP BY customer_id

You might wonder why that WHERE statement at the end. It is not advised to run dbt full refresh every day. You might want to run it once a week or month. The WHERE statement in your incremental model ensures that only new or updated data is processed instead of reloading the entire dataset every time.

Step 2: Run your model

dbt run --full-refresh --select customer_spending

Execute full refresh only when needed.

dbt run --vars 'full_refresh: true'

This is more convenient as you don’t have to modify the SQL file every time.

Step 3: Always test after full refresh

dbt test --select customer_spending

Conclusion

Full refresh is a great weapon for data transformation. Once you understand it, you can efficiently use it in your transformation models whenever needed, considering computational cost for optimal usage.

With Hevo Transformer, you can seamlessly integrate dbt core and automate your data transformations directly within your warehouse. Sign up for free today!

Srujana Maddula
Technical Content Writer

Srujana is a seasoned technical content writer with over 3 years of experience. She specializes in data integration and analysis and has worked as a data scientist at Target. Using her skills, she develops thoroughly researched content that uncovers insights and offers actionable solutions to help organizations navigate and excel in the complex data landscape.