DBT Snowflake Integration: 4 Easy Steps to Connect & Accelerate your Data

on Data Build Tool, Data Integration, Data Warehouses, Snowflake • January 20th, 2022 • Write for Hevo

DBT Snowflake | Hevo Data

You need Real-Time High-Quality data from all of your data sources in a central repository to empower your business decisions with data. Traditional On-Premise Data Warehouse systems have limited Scalability and Performance, and they need to be maintained on a regular basis. With industry-leading Query Performance, Snowflake offers a more Cost-Effective and Instantly Scalable Solution. It’s a one-stop shop for Data Warehousing and Analytics in the Cloud, with complete SQL support for Data Analysis and Transformations.

The Python Software DBT (Data Build Tool) is a Transformation Tool that focuses on the Transform part of ELT (Extract, Load, Transform) operations. Its purpose is to complete Data Engineering chores before analyzing the data. DBT uses select statements to perform transformations, which are subsequently translated into Tables and Views, resulting in a straightforward and effective transformation process.

Assume you’re a Snowflake user in search of a Data Transformation Tool. Then, since DBT can organize Data Modeling and Data Transformation operations, it fits nicely into the list of Transformation Tool options. DBT can transform data that has already been loaded into your Data Warehouse. It can perform the letter ‘T’ in the word ‘ELT’ (Extract, Load, Transform).

In this blog post, you will get to know everything about the DBT Snowflake Integration along with different ways to authorize the connection. 

Table of Contents

What is Snowflake?

DBT Snowflake: Snowflake logo | Hevo Data
Image Source

Snowflake is the world’s first Cloud Data Warehouse solution, built on the infrastructure of a Cloud provider (AWS, Azure, or GCP) of the customer’s choice. Snowflake SQL is consistent with the ANSI standard and includes typical Analytics and Windowing capabilities. You’ll notice certain differences in Snowflake’s syntax, but you’ll also see some commonalities. 

Snowflake’s Integrated Development Environment (IDE) is totally Web-based. To interact with your instance, go to XXXXXXXX.us-east-1.snowflakecomputing.com. After logging in, you’ll be sent to the primary Online GUI, which serves as the main IDE, where you can begin interacting with your data assets. Additionally, each query tab in the Snowflake interface is referred to as a “Worksheet” for convenience. These “Worksheets” are saved automatically and can be accessed at any time, similar to the tab history feature.

Key Features of Snowflake

  • Caching: Snowflake uses a Caching Mechanism such that when the same query is issued again, the results are rapidly returned from the cache. To avoid re-generation of the report when nothing has changed, Snowflake employs persistent (within the session) query results.
  • Query Optimization: Snowflake can optimize a query on its own by Clustering and Partitioning. There’s no need to be concerned about Query Optimization.
  • Secure Data Sharing: Using Snowflake Database Tables, Views, and UDFs, data can be shared from one account to another.
  • Support to File Formats: Snowflake supports the import of semi-structured data such as JSON, Avro, ORC, Parquet, and XML. It has a column type called VARIANT that allows you to store semi-structured data.
  • Standard and Extended SQL Support: Snowflake provides good ANSI SQL support, as well as advanced SQL features including Merge, Lateral View, Statistical Functions, and many others.
  • Fault-Tolerant: In the event of a failure, Snowflake delivers extraordinary fault-tolerant capabilities to recover the Snowflake object (tables, views, database, schema, and so on).

To get further information check out Snowflake’s official website here

What is DBT (Data Build Tool)?

DBT Snowflake: dbt logo | Hevo Data
Image Source

The Python Application DBT (Data Build Tool) is a Transformation Tool that concentrates on the Transformation element of ELT (Extract, Load, Transform) procedures. Its goal is to execute Data Engineering tasks before using the data for Analytics. Select statements are utilized by DBT to execute transformations, which are then converted into Tables and Views, making the transformation process simple and effective.

DBT is available in two forms.

  • DBT CLI: It is a Command Line Interface that runs on a terminal.
  • DBT Cloud:  A web-based solution with an Integrated Development Environment.

Key Features of DBT

  • Accurate Documentation: DBT’s Documentation is updated on a regular basis when new models are generated. It is also freely accessible at all times during the construction process. The documentation is generated based on the Specified Descriptions, Model Dependencies, SQL Files, Sources, and Configured Tests.
  • Version Management: Version control has been more integrated thanks to the integration of GIT and DBT. In the associated GIT repository, all of the Project’s Models, Tests, Sources, Packages, and other Configurations are versioned.
  • Testing: DBT includes prebuilt tests for uniqueness, non-nullity, referential integrity, and accepted value. Apart from the Default Tests, Jinja and SQL can be used to construct Custom Test Cases to offer further testing functionality.
  • Refreshes of Data in the DBT Cloud: Without the use of an orchestration tool, it is possible to schedule refresh in the production environment according to business requirements using DBT.

What makes DBT so effective?

As a DBT user, your primary focus will be on writing models (i.e. select queries) that reflect core business logic – you won’t need to write boilerplate code to create tables and views, or to specify the order in which your models run. Instead, DBT takes care of converting these models into objects in your warehouse.

To materialize queries as relations, DBT handles boilerplate code. You can easily configure a materialization for each model you create.

Materialization is a build strategy for your select query; the code behind it is robust, boilerplate SQL that wraps your select query in a statement to create or update a relationship.

The following materializations are pre-installed in DBT:

  • view (default): In the database, the model is created as a view.
  • table: In the database, the model is stored as a table.
  • ephemeral: The model isn’t created in the database directly, but rather as common table expressions in dependent models.
  • incremental:  In subsequent runs, DBT inserts new rows and updates changed rows in the table, which is initially built as a table.

If necessary, custom materializations can be created.

The order in which models are executed is determined by DBT. When it comes to data transformation, it’s common to take a staged approach. Through the ref function, DBT provides a mechanism for implementing transformations in stages.

Rather than choosing from your warehouse’s existing tables and views, you can choose from a different model, as shown below:

select
  orders.id,
  orders.status,
  sum(case when payments.payment_method = 'bank_transfer' then payments.amount else 0 end) as bank_transfer_amount,
  sum(case when payments.payment_method = 'credit_card' then payments.amount else 0 end) as credit_card_amount,
  sum(case when payments.payment_method = 'gift_card' then payments.amount else 0 end) as gift_card_amount,
  sum(amount) as total_amount

from {{ ref('base_orders') }} as orders
left join {{ ref('base_payments') }} as payments on payments.order_id = orders.id

The model specified in the “ref” function is replaced by the relation name when compiled to executable SQL by DBT.

Importantly, the ref function is also used by DBT to determine the order in which the models should be executed – in the example above, “base orders” and “base payments” must be built before the orders model can be built.

Each node of the graph represents a model, and edges between the nodes are defined by “ref” functions, with a model specified in a ref function being recognized as a predecessor of the current model.

Models are executed in the order specified by the DAG when DBT runs; there is no need to specify the order of execution of your models explicitly. A single transformation (for example, renaming a column) can be shared as a predecessor for a number of downstream models, reducing the need to repeat SQL.

What else does DBT have to offer?

DBT comes with a number of extra features that make it even more powerful, such as:

  • Code Compiler: SQL files in DBT can include Jinja, a lightweight templating language. Using Jinja in SQL allows you to use control structures in your queries, such as if statements and for loops. It also allows for the sharing of SQL commands via macros.
  • Documentation: Your DBT models’ documentation can be written, versioned, and shared using DBT. For each model and field, you can write descriptions (in plain text or markdown). These descriptions, along with other implicit information (such as the model lineage or the field data type and tests used), can be turned into a website and shared with your entire team, providing an easily accessible databook for anyone who works with DBT models.
  • Tests: Because the underlying data is always changing, SQL can be difficult to test. By asserting the results generated by a model, DBT provides a way to improve the integrity of the SQL in each model. You can check whether a column in a model only contains the following values right out of the box.
    • Values that are not null
    • Exceptional worth
    • Values that have a counterpart in another model (for example, a customer id for an order corresponds to an id in the customer’s model).
    • a set of values taken from a list

Any assertion you can make about your model in the form of a select query can be turned into a test, and tests can easily be extended to suit business logic specific to your organization.

  • Package management: DBT includes a package manager that allows analysts to create and publish public and private DBT code repositories that can then be referenced by others. To get started, analysts can use libraries that provide commonly-used macros, such as “DBT utils,” or dataset-specific projects for software services like Snowplow and Stripe.
  • Seed file loader: Raw values are frequently mapped to a more readable value (e.g., converting a country code to a country name) or enriched with static, or infrequently changing data in analytics (e.g. using revenue targets set each year to assess your actuals). Seed files are data sources that can be saved as a CSV file in your project and loaded into your data warehouse using the seed command.
  • Data snapshots: Records in a data source are frequently mutable, meaning they change over time. If you want to reconstruct historical values, this can be difficult to handle in analytics. Through the use of snapshots, DBT provides a mechanism for capturing raw data at a specific point in time.
  • Understand raw data sources: Because DBT selects from raw data already in your data warehouse, it’s important to understand the purpose of these tables and schemas. When you define raw data as a Source in DBT, you can:
    • Data that is loaded into your warehouse should be documented and tested.
    • To help identify upstream issues, compare the freshness of your data to specified limits.
    • Recognize which models are reliant on which data sources.

Who is DBT appropriate for?

Anyone who interacts with a data warehouse should use DBT. Data engineers, data analysts, and data scientists, as well as anyone who knows how to write SQL select queries, can use it.

If you’re new to programming, you’ll probably want to spend some time learning the basics of the command line and getting acquainted with git.

To get the most out of DBT, you should be familiar with some programming fundamentals, such as for loops and if statements, so you can use Jinja effectively in your models.

What are the Benefits and Drawbacks of DBT?

Benefits:

  • The open-source software that is both free and well documented.
  • It’s simple to set up and use. SQL-based application
  • Data engineers and analysts will be more productive.
  • Collaboration and re-use have increased.
  • Data quality testing and integrated quality assurance
  • Documentation is generated automatically.
  • Standards are enforced for all data pipelines.
  • Creates data lineage and catalog information that can be searched.

Drawbacks:

  • Although there is a graphical user interface, it is primarily text-based.
  • The focus is primarily on database data transformation.

What Are the DBT Components?

Everything in DBT is stored in one of three file types:.sql,.yml, or.md. Working backwards:

  • Documentation files are named.md. They’re a type of markdown that’s used to create model documentation.
  • Configuration files in the.yml format. From configuring the project (DBT project.yml) to detailed configuration and testing in a schema.yml file, these files handle it all. Because DBT believes in Configuration-as-Code (CaC), most.yml files contain Jinja templating or the model template parameters. Any configuration can be overridden (and is frequently done) in a.sql file to meet design and performance requirements.
  • Models and tests are stored in.sql files. Models are SELECT statements in the data warehouse that result in a new (or incrementally updated) table or view. They can even be transient, existing only while the pipeline is running. These files are then turned into SQL scripts that can be run on the specified database. In Snowflake, ephemeral tables are effectively temporary tables because they are nested in the compiled SQL.

‘dbt init’ creates a new project after installing DBT using pip (or pipenv, as we prefer).
Many different ‘modules‘ can be found in a DBT project, and a few of them are listed below:

  • Models: The.sql files that make up the ELT pipeline will be found in this directory (s). When they’re executed, they’ll create a table, view, or ephemeral (temp) table of the data. By referencing other models, they create a chain of execution stages.
  • Tests: These are.sql files used to run custom data tests. For example, if you want to ensure that a certain percentage of values in a certain column are within a certain range, you’d write a model to test this assumption.
  • Macros: These are.sql files that have been Jinja-templatized. These models should be highly reusable (think DRY — Don’t Repeat Yourself) and play an important role in a complete CaC solution. These are the functions that are used to build larger solutions in a software engineering solution, and they are also known as macros.
  • Analysis: These are models that do not run when the project is deployed. Instead, they’re intended to be ‘on-demand’ queries that are most often associated with specialized views or tables in analytics solutions, such as a data science pipeline.
  • Docs: These are markdown files used to document models (with special decorations — see official documentation). These are optional, but in any production environment, they should be used.
  • Logs: These are the run logs. They’re created when you run the DBT run command.
  • Target Directories: These are created when compiling, running, or creating documentation. They hold all of the metadata as well as the SQL code that has been compiled.

Why Should you Integrate DBT Snowflake?

DBT Snowflake: Integration need | Hevo Data
Image Source

Snowflake is a Software-as-a-Service (SaaS) Data Warehouse that is faster, more user-friendly, and more adaptable than a traditional Data Warehouse. It is built on a SQL Database Engine with a Cloud-specific Architecture. Snowflake is designed on a multi-cluster, shared Data Architecture, which separates the Data Storage, Compute, and Query Processing layers. DBT is an open-source technology that Snowflake uses to control the ELT load. The following benefits will be gained with the DBT Snowflake connection:

  • Converting Tables to Views: To convert a table to a view, simply update the materialization in a single config file.
  • Stored Procedures: The models display the stored procedures developed in DBT, which may be viewed and modified.
  • Combining Transformation Logic: DBT uses DBT Tags to group together Transformation Logic that is comparable.
  • Version Control: DBT integrates with GitHub to provide version control.
  • Open-Source Community: Rather than beginning from scratch, the development could be aided by exchanging experiences from other developers.

Simplify Snowflake Data Transfer with Hevo’s No-code Pipeline

Hevo Data is a No-code Data Pipeline that helps you transfer data from 100+ sources (including 40+ Free Data Sources) to Snowflake in real-time in an effortless manner. After using Hevo you can easily carry out tasks like DBT Snowflake Integration.

Get Started with Hevo for Free

Key Features of Hevo Data:

  • Fully Managed: Hevo Data is a fully managed service and is straightforward to set up.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL databases to name a few.  
  • Ensure Unique Records: Hevo Data helps you ensure that only unique records are present in the tables if Primary Keys are defined.
  • Multiple Sources: Hevo Data has various connectors incorporated with it, which can connect to multiple sources with ease. 
  • Automatic Mapping: Hevo Data automatically maps the source schema to perform analysis without worrying about the changes in the schema.
  • Real-time Data Transfer: Hevo Data works on both batch as well as real-time data transfer. 
  • Resume from Point of Failure: Hevo Data can resume the ingestion from the point of failure if it occurs. 
  • Advanced Monitoring: Advanced monitoring gives you a one-stop view to watch all the activity that occurs within pipelines.
  • 24/7 Support: With 24/7 Support, Hevo provides customer-centric solutions to the business use case.

Steps to load Snowflake data using Hevo Data:

  • Sign up on the Hevo Data, and select Snowflake as the destination.
  • Provide the user credentials and connect to the server.
  • Select the database, and schema to load the data.
Sign up here for a 14-Day Free Trial!

How to establish DBT Snowflake Connection? 

Step 1: Create a Snowflake Account

The first step in DBT Snowflake Integration is to create a Snowflake account and take note of the highlighted details, which will be used to establish a DBT Snowflake connection.

DBT Snowflake: Step 1 | Hevo Data
Image Source

Step 2: Create a DBT Account

Try DBT Cloud Free to create an account with DBT. Create a new project once you’ve logged in. Follow the steps below to set up the newly created project.

DBT Snowflake: Step 2 | Hevo Data
Image Source

To get started, click the start button. On the next page, select an acceptable DBT Project Name and click Continue. Select Snowflake from the list of Data Warehouses on the next screen.

Step 3: Configure DBT Snowflake Connection

When configuring a DBT Snowflake connection, the following fields must be filled in:

DBT Snowflake: Step 3a | Hevo Data
Image Source
  • Account: For the Snowflake account to which you want to connect, the Account Name should be abc12345.east-us-2.azure if the Snowflake account URL is abc12345.east-us-2.azure.snowflakecomputing.com.
  • Role: After connecting to Snowflake, utilize this role. (This is an optional field.)
  • Database: To perform queries, build a connection to this Logical Database in the Data Warehouse.
  • Data Warehouse: The Virtual Warehouse will be used to conduct queries.
  • Auth Methods: There are two Auth methods: 
    • Username / Password: Enter the Snowflake username (particularly, the login name) and the matching user’s Snowflake password to authorize DBT Cloud to perform Snowflake queries on behalf of a Snowflake user.
    • Key Pair: When used from a DBT Cloud project, the Keypair auth method uses Snowflake’s Key Pair Authentication to authenticate the credentials. For the authentication process, rsa_public_key should be configured for the Snowflake user after generating an encrypted key pair.
DBT Snowflake: Step 3b | Hevo Data
Image Source

Test the DBT Snowflake connection once the credentials have been provided. Connect DBT to an empty GIT repository after a successful connection.

Step 4: Connect DBT with your GitHub Repository

Establish a DBT Snowflake connection to your GitHub repository. Create a new repository on GitHub with the same email address that you used to create your DBT Cloud account. On the setup repository screen, click GitHub.

Click the link to your GitHub Account on the integrations page.

Select only the select repositories and the newly formed DBT repository, then click Install. You will be routed to the Integrations page after installation. To return to the remainder of your account, go to the hamburger menu in the top left corner of your DBT page click on the Home Page, and click on Continue.

The link to your repository will be highlighted below if you click on GitHub again. Click on that repository, then proceed once it has been successfully connected.

You have now successfully established DBT Snowflake Integration!

Ways to Authenticate your DBT Snowflake Integration

All Configurations 

The “basic” Snowflake Target Configs are shown below. Note that, as explained above, you should additionally supply auth-related configs appropriate to the authentication method you’re using.

ConfigRequired?Description
accountYesThe account to connect to as per Snowflake’s documentation. See notes below
userYesThe user to log in as
databaseYesThe database that DBT should create models in
warehouseYesThe warehouse to use when building models
schemaYesThe schema to build models into by default. Can be overridden with custom schemas
roleNo (but recommended)The role to assume when running queries as the specified user.
client_session_keep_aliveNoIf provided, issue a periodic select statement to keep the connection open when particularly long-running queries are executing (> 4 hours). Default: False (see note below)
threadsNoThe number of concurrent models dbt should build. Set this to a higher number if using a bigger warehouse. Default=1
query_tagNoA value with which to tag all queries, for later searching in QUERY_HISTORY view
retry_allNoA boolean flag indicating whether to retry on all Snowflake connector errors
retry_on_database_errorsNoA boolean flag indicating whether to retry after encountering errors of type snowflake.connector.errors.DatabaseError
connect_retriesNoThe number of times to retry after an unsuccessful connection
connect_timeoutNoThe number of seconds to sleep between failed connection retries
Table Source

The above configurations are used in the 3 types of Auth methods

1) Key Pair Authentication 

If you want to employ Key Pair Authentication, leave out the password and instead give your target a private_key_path and, optionally, a private_key_passphrase. Note that prior to version 0.16.0, private keys had to be encrypted and a private_key_passphrase had to be provided. In DBTv0.16.0, this behavior was changed.

my-snowflake-db:  target: dev  outputs:    dev:      type: snowflake      account: [account id]      user: [username]      role: [user role]
      # Keypair config      private_key_path: [path/to/private.key]      private_key_passphrase: [passphrase for the private key, if key is encrypted]
      database: [database name]      warehouse: [warehouse name]      schema: [dbt schema]      threads: [1 or more]      client_session_keep_alive: False      query_tag: [anything]
      # optional      connect_retries: 0 # default 0      connect_timeout: 10 # default: 10      retry_on_database_errors: False # default: false      retry_all: False  # default: false 

2) Password or User Authentication

As demonstrated below, Snowflake can be configured using basic User/Password Authentication.

my-snowflake-db:  target: dev  outputs:    dev:      type: snowflake      account: [account id]

my-snowflake-db:  target: dev  outputs:    dev:      type: snowflake      account: [account id]
      # User/password auth      user: [username]      password: [password]
      role: [user role]      database: [database name]      warehouse: [warehouse name]      schema: [dbt schema]      threads: [1 or more]      client_session_keep_alive: False      query_tag: [anything]
      # optional      connect_retries: 0 # default 0      connect_timeout: 10 # default: 10      retry_on_database_errors: False # default: false      retry_all: False  # default: false

3) SSO Authentication 

If you want to use SSO authentication with Snowflake, don’t use a password and instead give your target an authenticator config. ‘externalbrowser’ or a valid Okta URL can be used as an authenticator.

my-snowflake-db:  target: dev  outputs:    dev:      type: snowflake      account: [account id]
      # User/password auth      user: [username]      password: [password]
      role: [user role]      database: [database name]      warehouse: [warehouse name]      schema: [dbt schema]      threads: [1 or more]      client_session_keep_alive: False      query_tag: [anything]
      # optional      connect_retries: 0 # default 0      connect_timeout: 10 # default: 10      retry_on_database_errors: False # default: false      retry_all: False  # default: false

Conclusion

This article has exposed you to the various DBT Snowflake Integration to help you improve your overall decision-making and experience when trying to make the most out of your data. In case you want to export data from a source of your choice into your desired Database/destination like Snowflake, then Hevo Data is the right choice for you! 

Visit our Website to Explore Hevo

Hevo Data provides its users with a simpler platform for integrating data from 100+ sources for Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouses such as Snowflake, Database, or a destination of your choice. It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have Analysis-ready data in your desired destination.

Want to take Hevo for a spin?Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

Share your experience of learning about DBT Snowflake Integration! Let us know in the comments section below!

No-code Data Pipeline for Snowflake