Getting Started with dbt Core: Install, Build Models & Run Tests

If you’re getting started with dbt Core, you’re probably dealing with messy, scattered data that’s too raw to be useful right away. That’s where dbt (data build tool) steps in, helping analysts and engineers turn raw data into clean, structured models, ready for analysis.

Whether you’re a data professional or just getting started with analytics, this guide walks you through installing dbt Core and building your first model.

Table of Contents

What is dbt and Why Do We Need It?

When companies collect data from multiple sources, apps, websites, and CRMs, it’s rarely clean. It’s inconsistent, sometimes incomplete, and definitely not analysis-ready. dbt’s robust transformation layer in the ETL pipeline solves this issue.

dbt, short for Data Build Tool, is a command-line tool designed to help data professionals transform raw data in their warehouse using modular SQL. Instead of writing large, unmanageable SQL scripts, dbt encourages breaking queries into reusable models that can be tested, documented, and version-controlled.

With dbt Core, you can write SQL as code, just like you would write Python or JavaScript. It brings software engineering best practices like version control, CI/CD, modularity, and automated testing to analytics engineering. Unlike traditional ETL tools like Informatica or Talend, dbt is open-source, lightweight, and uses SQL, making it highly approachable for analysts.

There are two flavors of dbt:

dbt Core: The free and open-source CLI-based tool. Ideal for local and small-scale deployments.
dbt Cloud: A managed service with GUI, job scheduling, and team collaboration features. Ideal for enterprise-grade deployments.

Where does dbt fit in your data stack? Right in the transformation layer. It doesn’t extract or load data, it transforms it. dbt sits between your data warehouse (like Snowflake, BigQuery, or Redshift) and your BI tools (like Looker or Tableau), helping you convert raw data into clean, analytics-ready datasets.

Use cases include:

Building dimensional models and data marts
Transforming and staging raw ingestion tables
Ensuring quality with tests before dashboards break
Automating data documentation and lineage graphs
Enabling CI/CD in your analytics pipelines

One of the most important parts of getting started with dbt Core is familiarizing yourself with its foundational concepts, like models, tests, and macros.

dbt Core Concepts

Now that you know where dbt fits and what it’s good for, let’s break down its core building blocks. These concepts form the foundation of every dbt project:

Models: These are SQL files that define transformations. Think of them like views or tables built from raw data.
ref(): Instead of hardcoding table names, you use ref(‘model_name’) to link models. This helps dbt understand dependencies and build in the correct order.
Tests: You can write dbt tests to ensure unique IDs, no missing values, and primary-foreign key constraints.
Documentation: You can document models and columns directly in dbt. Then, generate a beautiful visual site with lineage graphs.
Macros: Reusable SQL snippets using Jinja templates. They help you avoid repeating code across models.
Seeds: CSV files that dbt can load into your warehouse. Great for small reference tables.
Snapshots: Used to track historical changes in records over time.

All of these pieces live inside your dbt project folder, which follows a clear and simple structure. A dbt project structure looks like this:

my_dbt_project/
├── dbt_project.yml   	# Config file
├── models/           	# Where your SQL models live
│   ├── staging/
│   └── marts/
├── macros/           	# Reusable SQL code
├── tests/            	# Custom tests
├── seeds/            	# CSVs to load
└── snapshots/        	# Version-tracking tables

Installing dbt Core (Step-by-Step)

If you’re getting started with dbt Core, the good news is it’s free, open-source, and runs right from your terminal. Here’s how to get started step-by-step:

Step 1: Install Python

First, check if Python 3.7 or above is already installed:

python –version

If not, head over to python.org and grab the latest version.

Also, dbt needs pip (Python’s package manager). Update it just in case:

python -m pip install –upgrade pip

Step 2: Create a Virtual Environment

A virtual environment keeps dbt and its dependencies separate from other projects. Follow this guide to create and activate virtual environment.

Step 3: Install dbt Core with Your Warehouse Adapter

Pick your data warehouse and install the matching dbt package. Example for BigQuery:

pip install dbt-bigquery

Other options include:

● dbt-postgres

● dbt-snowflake

● dbt-redshift

● dbt-databricks

Note: dbt Core only supports one adapter per environment. If you switch warehouses later, you’ll need to uninstall the current adapter first. Also, some adapters (like dbt-snowflake) may require additional system packages or git for certain dependencies.

Step 4: Check Installation: dbt –version

It will display your dbt version and the installed adapter.

Running Your First dbt Model

Let’s create your first project and run a model.

Step 1: Initialize Your Project: dbt init my_first_project

Follow the prompts to set up your profile and choose your warehouse.

Step 2: Configure the profiles.yml file

After initializing, dbt creates a skeleton project, but it also needs to know how to connect to your warehouse. That’s done via a profiles.yml file, usually located in ~/.dbt/.

Here’s a basic example for BigQuery:

my_first_project:
target: dev
outputs:
dev:
type: bigquery
method: service-account
project: your-gcp-project-id
dataset: your_dataset_name
keyfile: /path/to/your/service-account-key.json
threads: 1
timeout_seconds: 300

Make sure to replace the placeholders (your-gcp-project-id, your_dataset_name, etc.) with your actual values.

View the full dbt adapter setup guide here.

Step 3: Go Into the Project Folder

cd my_first_project

Step 4: Create a Simple SQL Model

Inside the models/ folder, create a file called hello.sql:

— models/hello.sql
SELECT ‘Hello, dbt!’ AS message

Step 5: Run the Model

dbt run

This will execute the SQL and create a table/view in your warehouse.

Step 6: Check the Output

Go to your data warehouse. You should see a table called hello with a single row: “Hello, dbt!”

Testing dbt Models

You can test your models to catch bad data early. Create a schema.yml file:

# models/schema.yml
version: 2
models:
– name: hello
columns:
– name: message
tests:
– not_null

Now run:

dbt test

It checks that the message column in your model doesn’t have any nulls.

Best Practices and Strategies

● Use folder structures like staging/, intermediate/, marts/ to keep things tidy

● Always use ref() to reference models (don’t hardcode names)

● Write tests for important columns like IDs, timestamps, and metrics

● Document your models and use dbt docs to share with your team

● Start simple, then scale your project as needed

Handy dbt CLI Commands

dbt init	Creates a new dbt project
dbt debug	Test connection to your warehouse and check project setup
dbt deps	Install dependencies from packages.yml
dbt run	Run all models in the project
dbt run –select tag:tag_name	Run models with a specific tag
dbt test	Run all tests defined in schema.yml
dbt test –select model_name	Test specific model
dbt seed	Load CSV seed files into your warehouse
dbt snapshot	Run snapshot logic for slowly changing dimensions
dbt docs generate	Generate documentation files
dbt docs serve	Start a local server to view docs and lineage graph
dbt build	Run models + tests + docs generation in one command
dbt clean	Remove dbt-generated artifacts (like compiled files)
dbt list	List all resources (models, tests, snapshots, etc.)
dbt compile	Compile dbt models to raw SQL
dbt run-operation macro_name	Run a macro from the CLI
dbt source freshness	Check freshness of your sources
dbt ls –resource-type model	List all models in the project

Conclusion

Getting started with dbt Core gives you a structured, scalable way to build reliable data pipelines, especially when your data lives in a modern warehouse like BigQuery, Snowflake, or Redshift. But transformation is just one part of the puzzle.

If you’re looking for a simpler, faster way to get started with dbt Core and build transformation pipelines without the usual setup hassle, try Hevo Transformer. It natively integrates with dbt Core, allowing you to write, schedule, and orchestrate transformations directly within your data pipelines, without switching tools or requiring additional infrastructure.

With Hevo Transformer, you get the best of both worlds: the flexibility of dbt and the simplicity of Hevo’s no-code platform.

Start transforming data smarter, not harder.

Srujana Maddula Technical Content Writer

Srujana is a seasoned technical content writer with over 3 years of experience. She specializes in data integration and analysis and has worked as a data scientist at Target. Using her skills, she develops thoroughly researched content that uncovers insights and offers actionable solutions to help organizations navigate and excel in the complex data landscape.