3 Handy Benefits of dbt Incremental Models [+3 Use Cases]
“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.”
– Geoffrey Moore, an American Management Consultant, and AuthorTable of Contents
The quote holds true for not just big data, but data analytics in general. I assume you are one of those smart deers. That’s why you are here. Now, let’s do a quality check about how far you have understood the importance of data analytics for your company– Have you ever used dbt incremental models for your functioning?
In this blog, I will walk you through the details of dbt incremental models including the use cases and advantages. Ready to dive in? Let’s start from scratch!
Table of Contents
What are dbt Incremental Models?
In dbt, an incremental model is a type of data model that only processes fresh or updated data rather than constantly altering the complete data set.
Instead of processing the whole data set each time you run a dbt model, the incremental strategy in dbt lets you update just a part of your data warehouse.
An incremental model simply adds new data to an existing dataset which has two ways to do it:
1. Either the new data key does not exist and it simply appends with new primary key
2. Or its already existing with a key and we have to upsert it with new data
Tables for incremental models are created in your data warehouse. The table is created by changing every row of the source data during the initial run of the model. During subsequent executions of dbt, the rows in your source data that you specifically ask dbt to convert are added to the built-in target table.
The rows in your source data that have been updated since dbt last ran are frequently the rows you filter for an incremental run. Thus, your model is developed incrementally whenever you carry out dbt execution.
By using an incremental model, you can cut down on the volume of data that needs to be converted during the course of your changes. This enhances warehouse performance and lowers computing costs.
Now, you have the basics. In the next section, let’s understand how it works.
How do Incremental Models Work?
Different databases respond to incremental materialization from dbt in different ways. Where possible, a merge statement is used to update and insert new data. When using a merge on a warehouse that does not allow merge statements, the records in the target table that need to be modified are first deleted using a delete statement.
And, it’s followed by an insert statement. To ensure that this is completed as a single item of work, transaction management is used.
Now, when should you go for incremental models?
- When source data is huge and models take up too much time to run: This situation is when the source data transformations are computationally intensive (that is, takes a long time to execute). For instance, a very large underlying data source may be used, or complicated Regex functions may be used to alter the data source. Sometimes, there is a trade-off between the expense of performing transformations to a data source and the complexity of applying incremental logic. Using incremental logic is a smart idea if the cost and time involved in rebuilding the table after each dbt run proves to be problematic.
- When historical data doesn’t change: This situation is when there is no requirement for revisions to the historical data and it is relatively static. Nevertheless, it is more difficult to capture and transform these changes by using incremental logic if the historical data is changing and needs to be updated. To determine whether using incremental logic is appropriate for you in this case, you need to weigh the complexity versus resource trade-off.
- When table updates are needed frequently: Applying incremental logic is beneficial when your stakeholders need the data from an underlying table to be frequently updated.
I believe you have got an idea about how it works. How about I take you through the key benefits of incremental models?
Benefits of Incremental Models
You can shorten the run times of your data models using incremental models. The performance of your data warehouse improves and you save money on computing costs when your run time is decreased.
Using incremental logic can also save a significant amount of time and compute resources. This is true especially when you have “elephant” tables or hundreds of dbt models operating regularly and concurrently.
For instance, every X hours, a dbt transformation on a table takes 10 minutes to execute. Since we do not need to reprocess the entire table, the run now only needs to run every X hours and only takes 20 seconds. Consider how these savings will grow over time.
Several dbt data models would take over three hours to run every morning before applying incremental materialization. This is not ideal as your business teams expect the data to be accessible to them when they start their workday.
Incremental models are defined with select statements, similar to how other dbt materializations are, and the materialization is specified in a config block.
You must also inform dbt of the following information to use incremental models:
- How to perform an incremental run’s row filter
Wrap valid SQL that filters for these rows in the is_incremental() macro to instruct dbt which rows it should alter on an incremental run. You should frequently search for “new” rows, i.e., rows that have been added since dbt last executed this model.
The most recent timestamp in your target database should be checked to determine the timestamp of the most recent execution of this model. The “this” variable in dbt makes it simple to query your target table.
It’s also typical to wish to record both new and updated records. To prevent bringing in modified records as duplicates, you must specify a unique key for updated records.
- Defining a unique key (optional)
Instead of only appending new rows, a unique key allows altering already-existing rows. If new data is received for an existing unique key, the new data may be substituted for the old data rather than being added to the table. A duplicate row can be disregarded if it appears. For other choices on controlling this update behavior, such as selecting only particular columns to update, see the strategy-specific configurations.
If a unique key is not specified, append-only behavior will occur. This implies that all rows returned by the model’s SQL will be inserted into the target table that already exists, regardless of whether the rows are duplicates.
Finally, let’s go through some tips and tricks to apply incremental logic.
Best Practices for Applying the Incremental Logic
To use the dbt incremental models in the best way, you need the following ticks and tricks.
Adding If NOT Incremental Statement
When using the incremental model application to handle a wider range of goals or problems, adding this If statement to your data models can be incredibly helpful. This condition (a SQL statement) is only satisfied when the model is operating in full refresh mode.
An excellent example would be a model with a historical source with a current source of data. The historical source won’t change, therefore we don’t need to perform dbt transformations on it constantly.
Configuring dbt Pre-hook
When used correctly, the dbt Pre-hook (or Post-hook) configuration can be a very helpful tool. Before your dbt model run, the pre-hook causes one or more SQL statements that you define, to be applied to the target table.
Introducing Macros/ Variables Instead of Full Refreshes
Instead of completely refreshing the target table, you can always build a flexible dbt variable or a macro to reprocess a portion of your data.
Conclusion
dbt incremental model is a type of data model that only processes fresh or updated data rather than constantly updating the complete data set. You can use this when source data is huge and models take up too much time to run. It is also useful when historical data doesn’t change and table updates are needed frequently.
Main benefits of dbt incremental models include shortening the run times of your data models using incremental models. It also enables you to save money on computing costs when your run time is decreased.
Once you go through all the details, you will understand whether it’s a good fit for your requirements. Go ahead and take a decision then!
You can simplify your task with a cloud-based ELT solutions like Hevo Data which automates the data integration process for you and runs your dbt projects to transform data present in your data warehouse. At this time, the dbt Core™ on Hevo is in BETA. Please reach out to Hevo Support or your account executive to enable it for your team.
Visit our Website to Explore HevoOffering 150+ plug-and-play integrations and saving countless hours of manual data cleaning & standardizing, Hevo Data also offers in-built pre-load data transformations that get it done in minutes via a simple drag-and-drop interface or your custom python scripts.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.
Share your experience of learning about dbt incremental models! Let us know in the comments section below!