After the hustle and bustle of extracting data from multiple sources, you have finally loaded all your data to a single source of truth like the Snowflake data warehouse. However, data modeling is still challenging and critical for transforming your raw data into any analysis-ready form to get insights. Data Build Tool offers an easy yet effective way to do just that! 

By setting up the dbt Snowflake connection, you would be able to apply business logic using simple SQL statements easily. But how does one get started with this integration? How to build dbt data pipelines and successfully deploy them into production? Look no further! This nifty 7-minute article will answer all your queries on the dbt Snowflake Integration.

The new workflow with dbt and Snowflake isn’t a small improvement. It’s a complete redesign of our entire approach to data that will establish a new strategic foundation for analysts at JetBlue to build on.

Ben Singleton, Director of Data Science & Analytics at JetBlue

What is dbt?

DBT Snowflake - DBT Logo
Image Source

Data Build Tool is a transformation application that helps in writing, testing, and deploying transformations on data present in your data warehouse. Using simple SELECT SQL statements, you can model your data before using it for analytics. dbt offers multiple features that make the data transformation process seamless for all use cases:

  • GIT integration with dbt provides version control for all your dbt projects consisting of models, tests generated, sources, packages used, and various other configurations. 
  • dbt automatically generates and updates the documentation when models are created and modified.
  • Along with pre-built tests, dbt allows you to write and run custom tests written using Jinja and SQL.
  • With dbt’s lineage graphs, you get better visibility over how data maps back to business logic.

What is Snowflake?

DBT Snowflake - Snowflake Logo
Image Source

Snowflake is a cloud-based data warehousing platform that allows businesses to store, analyze, and share data in a secure and scalable way. Offering support for structured, semi-structured, and unstructured data, Snowflake’s architecture separates the storage and compute resources, thereby enabling faster query processing and performance. Becoming a popular choice among various businesses across the globe, Snowflake assures complete data security with data encryption, user authentication, and role-based access controls.

dbt with Snowflake

dbt promotes collaboration among data teams to set up end-to-end data transformation workflows over Snowflake using SQL. The dbt Snowflake integration benefits include:

  • You get complete control over the size of the data warehouse for each of your dbt models. By simply implementing the right logic, you can ensure better run-time, manage costs, and meet internal data freshness SLAs.
  • Reducing any potential downtime, dbt provides you with separate development and production environments.
  • Better control access over sensitive data with dbt’s dynamic data masking.
  • Effectively manage administrative tasks using dbt hooks and operations macros.  

How to set up dbt and Snowflake?

To get started with the dbt Snowflake integration, you can follow the easy steps given below:

  • Step 1: If you don’t already have a Snowflake account, you can create a free trial account for your new dbt Snowflake connection. You can go for the Enterprise edition, choose a cloud provider and region, and agree to the terms of service. Finally, set up your account password, and you are ready with your new Snowflake account.
DBT Snowflake - Snowflake Free Trial Enterprise plan
Image Source
  • Step 2: For loading your data, click on the “+ Worksheet” button in the upper right-hand corner of the screen to create a new worksheet.
  • Step 3: As a dbt Snowflake example, this article considers the sample jaffle shop and stripe data stored in the S3 bucket as CSV files. You can first create virtual data warehouses, databases, and schemas in Snowflake and then load the data using the following commands.
create warehouse transforming;
create database raw;
create database analytics;
create schema raw.jaffle_shop;
create schema raw.stripe;

Loading data   
create table raw.jaffle_shop.customers 
( id integer,
  first_name varchar,
  last_name varchar
);

copy into raw.jaffle_shop.customers (id, first_name, last_name)
from 's3://dbt-tutorial-public/jaffle_shop_customers.csv'
file_format = (
  type = 'CSV'
  field_delimiter = ','
  skip_header = 1
  );
  • Step 4: In the new Snowflake UI, click on the home icon in the upper left-hand corner and then click on Admin title, and then Partner Connect. 
DBT Snowflake -Partner-Connect-DBT-1
Image Source
  • Step 5: Select the RAW and ANALYTICS databases to get access for your new dbt user role to each database. Click on the Connect button. Finally, click on the Activate button. After filling up your account name and password, your dbt Cloud registration is complete for the dbt Snowflake connection. 
DBT-Snowflake - Connect-to-DBT
Image Source
  • Step 6: In your final step of the dbt Snowflake integration, click the gear icon in the upper right and select Account Settings. Choose the “Partner Connection Trial” project and select snowflake in the overview table. Select edit and update the fields database and warehouse to be analytics and transforming, respectively.
DBT Snowflake - Update database and warehouse details
Image Source

Build dbt Data Pipelines

To completely understand the use of dbt Snowflake integration, let’s consider an example of analyzing the trading performance of a firm that has trading desks spread across various regions. This article considers the datasets in Knoema Economy Data Atlas available in Snowflake Data Marketplace. Let’s see how you can get started with building Snowflake dbt data pipelines:

DBT Snowflake-Architecture
Image Source
  • Step 1: Let’s first create a knoema_sources.yml file in the staging/knoema folder to declare your dbt sources. Paste the following code in your dbt cloud IDE. Afterward, click on the Commit button.
version: 2

sources:
  - name: knoema_economy_data_atlas
    database: knoema_economy_data_atlas
    schema: economy
    tables:
      - name: exratescc2018
      - name: usindssp2020
  • Step 2: Create staging models to clean up our raw objects. As you create these as the first-level transformation, you build a consistent foundation for developing more complex transformations upon it. Create a new file in the staging/knoema folder named stg_knoema_fx_rates.sql and paste the following select statement. Save the file and click on the preview to check your query before execution.
with source as (
 
    select * from {{source('knoema_economy_data_atlas','exratescc2018') }}
 
), 
 
renamed as (
 
select 
 
    "Currency" as currency,
    "Currency Unit" as currency_unit,
    "Frequency" as frequency,
    "Date" as exchange_date,
    "Value" as exchange_value,
    "Indicator" as indicator,
    "Indicator Name" as indicator_name,
    'Knoema.FX Rates' as data_source_name
 
from source 
 
) 
 
select * from renamed 
  • Step 3: Enter the following code to run your models in the staging area. Save your work after a successful run and click on Commit with a good commit message.
dbt run --select staging.*
  • Step 4: Now, you can create intermediate models for your project based on your business requirements. To know more about how to add these models, you can check out the sample Snowflake code. You can also generate your documentation site, execute the following command and click on the View Docs option:
dbt docs generate
DBT Snowflake - Docs Generate
Image Source

Test and Deploy dbt Data Pipelines

With dbt, you get 4 pre-built tests, i.e., For tests, dbt comes with a set of 4 pre-defined data tests uniqueness, not_null, check constraints, and relationship integrity. For documentation, dbt takes to model and column descriptions and will add them to the documentation site so that you have even more information about your models to share with your stakeholders via the site. 

For dbt Snowflake testing, you can create a sample file intermediate.yml, add the following code for testing and execute the command “dbt test”.

version: 2
 
models:
  - name: int_fx_rates
    description: "An intermediate model that filters stg_knoema_fx_rates"
    columns:
      - name: currency||exchange_date
        tests:
          - unique
          - not_null
 
  - name: int_unioned_book
    description: "An intermediate model unions the manual_book csvs"
    columns:
      - name: instrument
        tests:
          - not_null
          - relationships:
              to: ref('int_knoema_stock_history')
              field: company_symbol
 
  - name: int_knoema_stock_history
    description: "An intermediate model that pivots the stg_knoema_stock_history model by indicator"
    columns:
      - name: company_symbol||stock_date
        tests:
          - not_null
          - unique

If any test is not running successfully, you can return to your dbt Snowflake models, modify, and re-run the tests to check again. After you have committed all your work, you can now start the dbt Snowflake deployment process by following these steps:

  • Step 1: Click on the merge to master button.
DBT Snowflake - merge to master
Image Source
  • Step 2: Now click on the hamburger icon and go to Environments > Deployment> Settings.
DBT Snowflake - Environment
Image Source
  • Step 3: Click on the Edit option to modify your schema name from your dbt Snowflake development schema to production, and click on the Save button.
DBT Snowflake - Change Schema from Developement to Production
Image Source
  • Step 4: Click on the hamburger menu on the top left and click on Jobs to update the default Partner Connect created job.
  • Step 5: Choose the pre-configured job named Partner Connect Trial Job and go to Settings.
  • Step 6: Click on Edit on the top right and change the name of the Job to Production Job. Finally, click on the save button to save all the changes. You can now click on the job name in the path on top and click on Run now to complete the deployment of your dbt Snowflake model to production.
DBT Snowflake - Production Job Settings
Image Source

Snowflake dbt Best Practices

To get the best out of the dbt Snowflake integration, you can follow the good practices listed below:

  • Set up your credentials to use a key pair.
  • Create both a group and warehouse for dbt to use. This is important because the dbt group will have the necessary rights to create and read the various objects. Also, a separate data warehouse allows your dbt transformations to be isolated (compute and memory).
  • Create a dbt service account for setting up any dbt Snowflake automation, as the service account will not get removed when someone has left.

Final Thoughts

By setting up the dbt Snowflake connection, you can leverage the salient features of dbt for easy, scalable, and structured data transformations. With Git-enabled version control and auto-generated documentation, pre-built tests, and separate development and production environments, dbt provides a seamless data transformation process. However, you still need to pull data from all your sources and load them into your data warehouse. 

For cases when you rarely need to replicate data, your engineering team can easily do it. For frequent and massive volumes of data transfers from multiple sources, your engineering team would need to monitor and fix any data leaks constantly. Or you can simply hop onto a smooth ride with cloud-based ELT solutions like Hevo Data which automates the data integration process for you and runs your dbt projects to transform data present in Snowflake. At this time, the dbt Core™ on Hevo is in BETA. Please reach out to Hevo Support or your account executive to enable it for your team.

Visit our Website to Explore Hevo

Offering 150+ plug-and-play integrations and saving countless hours of manual data cleaning & standardizing, Hevo Data also offers in-built pre-load data transformations that get it done in minutes via a simple drag-n-drop interface or your custom python scripts. 

Want to take Hevo for a spin? Sign Up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.

Share your experience of learning about dbt Snowflake integration! Let us know in the comments section below!

All your customer data in one place.