How to Setup DBT Snowflake Integration?: 4 Easy Steps

• January 20th, 2022

DBT Snowflake | Hevo Data

You need Real-Time High-Quality data from all of your data sources in a central repository to empower your business decisions with data. Traditional On-Premise Data Warehouse systems have limited Scalability and Performance, and they need to be maintained on a regular basis. With industry-leading Query Performance, Snowflake offers a more Cost-Effective and Instantly Scalable Solution. It’s a one-stop-shop for Data Warehousing and Analytics in the Cloud, with complete SQL support for Data Analysis and Transformations. In this article, you’ll learn about DBT Snowflake Integration along with different ways to authorize the connection. Let’s have a quick overview before we deep dive into the methods of successfully establishing DBT Snowflake Integration.

The Python Software DBT (Data Build Tool) is a Transformation Tool that focuses on the Transform part of ELT (Extract, Load, Transform) operations. Its purpose is to complete Data Engineering chores before analyzing the data. DBT uses select statements to perform transformations, which are subsequently translated into Tables and Views, resulting in a straightforward and effective transformation process.

Assume you’re a Snowflake user in search of a Data Transformation Tool. Then, since DBT can organize Data Modeling and Data Transformation operations, it fits nicely into the list of Transformation Tool options. DBT can transform data that has already been loaded into your Data Warehouse. It can perform the letter ‘T’ in the word ‘ELT’ (Extract, Load, Transform).

Table of Contents

What is Snowflake?

DBT Snowflake: Snowflake logo | Hevo Data
Image Source

Snowflake is the world’s first Cloud Data Warehouse solution, built on the infrastructure of a Cloud provider (AWS, Azure, or GCP) of the customer’s choice. Snowflake SQL is consistent with the ANSI standard and includes typical Analytics and Windowing capabilities. You’ll notice certain differences in Snowflake’s syntax, but you’ll also see some commonalities. 

Snowflake’s Integrated Development Environment (IDE) is totally Web-based. To interact with your instance, go to XXXXXXXX.us-east-1.snowflakecomputing.com. After logging in, you’ll be sent to the primary Online GUI, which serves as the main IDE, where you can begin interacting with your data assets. Additionally, each query tab in the Snowflake interface is referred to as a “Worksheet” for convenience. These “Worksheets” are saved automatically and can be accessed at any time, similar to the tab history feature.

Key Features of Snowflake

  • Caching: Snowflake uses a Caching Mechanism such that when the same query is issued again, the results are rapidly returned from the cache. To avoid re-generation of the report when nothing has changed, Snowflake employs persistent (within the session) query results.
  • Query Optimization: Snowflake can optimize a query on its own by Clustering and Partitioning. There’s no need to be concerned about Query Optimization.
  • Secure Data Sharing: Using Snowflake Database Tables, Views, and UDFs, data can be shared from one account to another.
  • Support to File Formats: Snowflake supports the import of semi-structured data such as JSON, Avro, ORC, Parquet, and XML. It has a column type called VARIANT that allows you to store semi-structured data.
  • Standard and Extended SQL Support: Snowflake provides good ANSI SQL support, as well as advanced SQL features including Merge, Lateral View, Statistical Functions, and many others.
  • Fault-Tolerant: In the event of a failure, Snowflake delivers extraordinary fault-tolerant capabilities to recover the Snowflake object (tables, views, database, schema, and so on).

What is DBT (Data Build Tool)?

DBT Snowflake: dbt logo | Hevo Data
Image Source

The Python Application DBT (Data Build Tool) is a Transformation Tool that concentrates on the Transformation element of ELT (Extract, Load, Transform) procedures. Its goal is to execute Data Engineering tasks before using the data for Analytics. Select statements are utilized by DBT to execute transformations, which are then converted into Tables and Views, making the transformation process simple and effective.

DBT is available in two forms.

  • DBT CLI: It is a Command Line Interface that runs on a terminal.
  • DBT Cloud:  A web-based solution with an Integrated Development Environment.

Key Features of DBT

  • Accurate Documentation: DBT’s Documentation is updated on a regular basis when new models are generated. It is also freely accessible at all times during the construction process. The documentation is generated based on the Specified Descriptions, Model Dependencies, SQL Files, Sources, and Configured Tests.
  • Version Management: Version control has been more integrated thanks to the integration of GIT and DBT. In the associated GIT repository, all of the Project’s Models, Tests, Sources, Packages, and other Configurations are versioned.
  • Testing: DBT includes prebuilt tests for uniqueness, non-nullity, referential integrity, and accepted value. Apart from the Default Tests, Jinja and SQL can be used to construct Custom Test Cases to offer further testing functionality.
  • Refreshes of Data in the DBT Cloud: Without the use of an orchestration tool, it is possible to schedule refreshes in the production environment according to business requirements using DBT.

Who is DBT appropriate for?

Anyone who interacts with a data warehouse should use DBT. Data engineers, data analysts, and data scientists, as well as anyone who knows how to write SQL select queries, can use it.

If you’re new to programming, you’ll probably want to spend some time learning the basics of the command line and getting acquainted with git.

To get the most out of DBT, you should be familiar with some programming fundamentals, such as for loops and if statements, so you can use Jinja effectively in your models.

What are the Benefits and Drawbacks of DBT?

Benefits:

  • The open-source software is both free and well-documented.
  • It’s simple to set up and use. SQL-based application
  • Data engineers and analysts will be more productive.
  • Collaboration and reuse have increased.
  • Data quality testing and integrated quality assurance
  • Documentation is generated automatically.
  • Standards are enforced for all data pipelines.
  • Creates data lineage and catalog information that can be searched.

Drawbacks:

  • Although there is a graphical user interface, it is primarily text-based.
  • The focus is primarily on database data transformation.

Why Should you Integrate DBT Snowflake?

DBT Snowflake: Integration need | Hevo Data
Image Source

Snowflake is a Software-as-a-Service (SaaS) Data Warehouse that is faster, more user-friendly, and more adaptable than a traditional Data Warehouse. It is built on a SQL Database Engine with a Cloud-specific Architecture. Snowflake is designed on a multi-cluster, shared Data Architecture, which separates the Data Storage, Compute, and Query Processing layers. DBT is an open-source technology that Snowflake uses to control the ELT load. The following benefits will be gained with the DBT Snowflake connection:

  • Converting Tables to Views: After successfully establishing DBT Snowflake. if there’s a need to convert a table to a view, you can simply update the materialization in a single config file.
  • Stored Procedures: The models display the stored procedures developed in DBT, which may be viewed and modified after the DBt SNowflake connection is set up..
  • Combining Transformation Logic: DBT uses DBT Tags to group together Transformation Logic that is comparable.
  • Version Control: DBT integrates with GitHub to provide version control.
  • Open-Source Community: Rather than beginning from scratch, the development could be aided by exchanging experiences with other developers.

Simplify Snowflake Data Transfer with Hevo’s No-code Pipeline

Hevo Data is a No-code Data Pipeline that helps you transfer data from 150+ sources (including 40+ Free Data Sources) to Snowflake in real-time in an effortless manner. After using Hevo you can easily carry out tasks like DBT Snowflake Integration.

Get Started with Hevo for Free

Key Features of Hevo Data:

  • Fully Managed: Hevo Data is a fully managed service and is straightforward to set up.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag-and-drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; MySQL, SQL Server, TokuDB, DynamoDB, and PostgreSQL databases to name a few.  
  • Ensure Unique Records: Hevo Data helps you ensure that only unique records are present in the tables if Primary Keys are defined.
  • Multiple Sources: Hevo Data has various connectors incorporated with it, which can connect to multiple sources with ease. 
  • Automatic Mapping: Hevo Data automatically maps the source schema to perform analysis without worrying about the changes in the schema.
  • Real-time Data Transfer: Hevo Data works on both batch as well as real-time data transfer. 
  • Resume from Point of Failure: Hevo Data can resume the ingestion from the point of failure if it occurs. 
  • Advanced Monitoring: Advanced monitoring gives you a one-stop view to watch all the activity that occurs within pipelines.
  • 24/7 Support: With 24/7 Support, Hevo provides customer-centric solutions to the business use case.

Steps to load Snowflake data using Hevo Data:

  • Sign up on the Hevo Data, and select Snowflake as the destination.
  • Provide the user credentials and connect to the server.
  • Select the database, and schema to load the data.
Sign up here for a 14-Day Free Trial!

How to establish DBT Snowflake Connection? 

Step 1: Create a Snowflake Account

The first step in DBT Snowflake Integration is to create a Snowflake account and take note of the highlighted details, which will be used to establish a DBT Snowflake connection.

DBT Snowflake: Step 1 | Hevo Data
Image Source

Step 2: Create a DBT Account

Try DBT Cloud Free to create an account with DBT. Create a new project once you’ve logged in. Follow the steps below to set up the newly created project.

DBT Snowflake: Step 2 | Hevo Data
Image Source

To get started, click the start button. On the next page, select an acceptable DBT Project Name and click Continue. Select Snowflake from the list of Data Warehouses on the next screen.

Step 3: Configure DBT Snowflake Connection

When configuring a DBT Snowflake connection, the following fields must be filled in:

DBT Snowflake: Step 3a | Hevo Data
Image Source
  • Account: For the Snowflake account to which you want to connect, the Account Name should be abc12345.east-us-2.azure if the Snowflake account URL is abc12345.east-us-2.azure.snowflakecomputing.com.
  • Role: After connecting to Snowflake, utilize this role. (This is an optional field.)
  • Database: To perform queries, build a connection to this Logical Database in the Data Warehouse.
  • Data Warehouse: The Virtual Warehouse will be used to conduct queries.
  • Auth Methods: There are two Auth methods: 
    • Username / Password: Enter the Snowflake username (particularly, the login name) and the matching user’s Snowflake password to authorize DBT Cloud to perform Snowflake queries on behalf of a Snowflake user.
    • Key Pair: When used from a DBT Cloud project, the Keypair auth method uses Snowflake’s Key Pair Authentication to authenticate the credentials. For the authentication process, rsa_public_key should be configured for the Snowflake user after generating an encrypted key pair.
DBT Snowflake: Step 3b | Hevo Data
Image Source

Test the DBT Snowflake connection once the credentials have been provided. Connect DBT to an empty GIT repository after a successful connection.

Step 4: Connect DBT with your GitHub Repository

Establish a DBT Snowflake connection to your GitHub repository. Create a new repository on GitHub with the same email address that you used to create your DBT Cloud account. On the setup repository screen, click GitHub.

  • Click the link to your GitHub Account on the integrations page.
  • Select only the selected repositories and the newly formed DBT repository, then click Install. You will be routed to the Integrations page after installation. To return to the remainder of your account, go to the hamburger menu in the top left corner of your DBT page click on the Home Page, and click on Continue.
  • The link to your repository will be highlighted below if you click on GitHub again. Click on that repository, then proceed once it has been successfully connected.

You have now successfully established DBT Snowflake Integration!

What are the Ways to Authenticate your DBT Snowflake Integration?

All Configurations 

Read along to know the ways to authenticate your DBT Snowflake Integration, The “basic” Snowflake Target Configs are shown below.

Note: You should additionally supply auth-related configs appropriate to the authentication method you’re using.

ConfigRequired?Description
accountYesThe account to connect to as per Snowflake’s documentation. See notes below
userYesThe user logs in as
databaseYesThe database that DBT should create models in
warehouseYesThe warehouse to use when building models
schemaYesThe schema to build models into by default. Can be overridden with custom schemas
roleNo (but recommended)The role to assume when running queries as the specified user.
client_session_keep_aliveNoIf provided, issue a periodic select statement to keep the connection open when particularly long-running queries are executing (> 4 hours). Default: False (see note below)
threadsNoThe number of concurrent models dbt should build. Set this to a higher number if using a bigger warehouse. Default=1
query_tagNoA value with which to tag all queries, for later searching in QUERY_HISTORY view
retry_allNoA boolean flag indicating whether to retry on all Snowflake connector errors
retry_on_database_errorsNoA boolean flag indicating whether to retry after encountering errors of type snowflake.connector.errors.DatabaseError
connect_retriesNoThe number of times to retry after an unsuccessful connection
connect_timeoutNoThe number of seconds to sleep between failed connection retries
Table Source

The above configurations are used in the 3 types of Auth methods

1) Key Pair Authentication 

If you want to employ Key Pair Authentication, leave out the password and instead give your target a private_key_path and, optionally, a private_key_passphrase. Note that prior to version 0.16.0, private keys had to be encrypted and a private_key_passphrase had to be provided. In DBTv0.16.0, this behavior was changed.

my-snowflake-db:  target: dev  outputs:    dev:      type: snowflake      account: [account id]      user: [username]      role: [user role]
      # Keypair config      private_key_path: [path/to/private.key]      private_key_passphrase: [passphrase for the private key, if key is encrypted]
      database: [database name]      warehouse: [warehouse name]      schema: [dbt schema]      threads: [1 or more]      client_session_keep_alive: False      query_tag: [anything]
      # optional      connect_retries: 0 # default 0      connect_timeout: 10 # default: 10      retry_on_database_errors: False # default: false      retry_all: False  # default: false 

2) Password or User Authentication

As demonstrated below, Snowflake can be configured using basic User/Password Authentication.

my-snowflake-db:  target: dev  outputs:    dev:      type: snowflake      account: [account id]

my-snowflake-db:  target: dev  outputs:    dev:      type: snowflake      account: [account id]
      # User/password auth      user: [username]      password: [password]
      role: [user role]      database: [database name]      warehouse: [warehouse name]      schema: [dbt schema]      threads: [1 or more]      client_session_keep_alive: False      query_tag: [anything]
      # optional      connect_retries: 0 # default 0      connect_timeout: 10 # default: 10      retry_on_database_errors: False # default: false      retry_all: False  # default: false

3) SSO Authentication 

If you want to use SSO authentication with Snowflake, don’t use a password and instead give your target an authenticator config. ‘externalbrowser’ or a valid Okta URL can be used as an authenticator.

my-snowflake-db:  target: dev  outputs:    dev:      type: snowflake      account: [account id]
      # User/password auth      user: [username]      password: [password]
      role: [user role]      database: [database name]      warehouse: [warehouse name]      schema: [dbt schema]      threads: [1 or more]      client_session_keep_alive: False      query_tag: [anything]
      # optional      connect_retries: 0 # default 0      connect_timeout: 10 # default: 10      retry_on_database_errors: False # default: false      retry_all: False  # default: false

Conclusion

This article has exposed you to the various insights on DBT Snowflake Integration to help you improve your overall decision-making and experience when trying to make the most out of your data. In case you want to export data from a source of your choice into your desired Database/destination like Snowflake, then Hevo Data is the right choice for you! 

Visit our Website to Explore Hevo

Hevo Data provides its users with a simpler platform for integrating data from 150+ sources for Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouses such as Snowflake, Database, or a destination of your choice. It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have Analysis-ready data in your desired destination.

Want to take Hevo for a spin?Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

Share your experience of learning about DBT Snowflake Integration! Let us know in the comments section below!

No-code Data Pipeline for Snowflake