If you’re getting started with dbt Core, you’re probably dealing with messy, scattered data that’s too raw to be useful right away. That’s where dbt (data build tool) steps in, helping analysts and engineers turn raw data into clean, structured models, ready for analysis.
Whether you’re a data professional or just getting started with analytics, this guide walks you through installing dbt Core and building your first model.
Table of Contents
What is dbt and Why Do We Need It?
When companies collect data from multiple sources, apps, websites, and CRMs, it’s rarely clean. It’s inconsistent, sometimes incomplete, and definitely not analysis-ready. dbt’s robust transformation layer in the ETL pipeline solves this issue.
dbt, short for Data Build Tool, is a command-line tool designed to help data professionals transform raw data in their warehouse using modular SQL. Instead of writing large, unmanageable SQL scripts, dbt encourages breaking queries into reusable models that can be tested, documented, and version-controlled.
With dbt Core, you can write SQL as code, just like you would write Python or JavaScript. It brings software engineering best practices like version control, CI/CD, modularity, and automated testing to analytics engineering. Unlike traditional ETL tools like Informatica or Talend, dbt is open-source, lightweight, and uses SQL, making it highly approachable for analysts.
There are two flavors of dbt:
- dbt Core: The free and open-source CLI-based tool. Ideal for local and small-scale deployments.
- dbt Cloud: A managed service with GUI, job scheduling, and team collaboration features. Ideal for enterprise-grade deployments.
Where does dbt fit in your data stack? Right in the transformation layer. It doesn’t extract or load data, it transforms it. dbt sits between your data warehouse (like Snowflake, BigQuery, or Redshift) and your BI tools (like Looker or Tableau), helping you convert raw data into clean, analytics-ready datasets.
Use cases include:
- Building dimensional models and data marts
- Transforming and staging raw ingestion tables
- Ensuring quality with tests before dashboards break
- Automating data documentation and lineage graphs
- Enabling CI/CD in your analytics pipelines
One of the most important parts of getting started with dbt Core is familiarizing yourself with its foundational concepts, like models, tests, and macros.
dbt Core Concepts
Now that you know where dbt fits and what it’s good for, let’s break down its core building blocks. These concepts form the foundation of every dbt project:
- Models: These are SQL files that define transformations. Think of them like views or tables built from raw data.
- ref(): Instead of hardcoding table names, you use ref(‘model_name’) to link models. This helps dbt understand dependencies and build in the correct order.
- Tests: You can write dbt tests to ensure unique IDs, no missing values, and primary-foreign key constraints.
- Documentation: You can document models and columns directly in dbt. Then, generate a beautiful visual site with lineage graphs.
- Macros: Reusable SQL snippets using Jinja templates. They help you avoid repeating code across models.
- Seeds: CSV files that dbt can load into your warehouse. Great for small reference tables.
- Snapshots: Used to track historical changes in records over time.
All of these pieces live inside your dbt project folder, which follows a clear and simple structure. A dbt project structure looks like this:
my_dbt_project/
├── dbt_project.yml # Config file
├── models/ # Where your SQL models live
│ ├── staging/
│ └── marts/
├── macros/ # Reusable SQL code
├── tests/ # Custom tests
├── seeds/ # CSVs to load
└── snapshots/ # Version-tracking tables
Installing dbt Core (Step-by-Step)
If you’re getting started with dbt Core, the good news is it’s free, open-source, and runs right from your terminal. Here’s how to get started step-by-step:
Step 1: Install Python
First, check if Python 3.7 or above is already installed:
python –version |
If not, head over to python.org and grab the latest version.
Also, dbt needs pip (Python’s package manager). Update it just in case:
python -m pip install –upgrade pip |
Step 2: Create a Virtual Environment
A virtual environment keeps dbt and its dependencies separate from other projects. Follow this guide to create and activate virtual environment.
Step 3: Install dbt Core with Your Warehouse Adapter
Pick your data warehouse and install the matching dbt package. Example for BigQuery:
pip install dbt-bigquery |
Other options include:
● dbt-postgres
● dbt-snowflake
● dbt-redshift
● dbt-databricks
Note: dbt Core only supports one adapter per environment. If you switch warehouses later, you’ll need to uninstall the current adapter first. Also, some adapters (like dbt-snowflake) may require additional system packages or git for certain dependencies.
Step 4: Check Installation: dbt –version
It will display your dbt version and the installed adapter.
Running Your First dbt Model
Let’s create your first project and run a model.
Step 1: Initialize Your Project: dbt init my_first_project
Follow the prompts to set up your profile and choose your warehouse.
Step 2: Configure the profiles.yml file
After initializing, dbt creates a skeleton project, but it also needs to know how to connect to your warehouse. That’s done via a profiles.yml file, usually located in ~/.dbt/.
Here’s a basic example for BigQuery:
my_first_project: target: dev outputs: dev: type: bigquery method: service-account project: your-gcp-project-id dataset: your_dataset_name keyfile: /path/to/your/service-account-key.json threads: 1 timeout_seconds: 300 |
Make sure to replace the placeholders (your-gcp-project-id, your_dataset_name, etc.) with your actual values.
View the full dbt adapter setup guide here.
Step 3: Go Into the Project Folder
cd my_first_project |
Step 4: Create a Simple SQL Model
Inside the models/ folder, create a file called hello.sql:
— models/hello.sql SELECT ‘Hello, dbt!’ AS message |
Step 5: Run the Model
dbt run |
This will execute the SQL and create a table/view in your warehouse.
Step 6: Check the Output
Go to your data warehouse. You should see a table called hello with a single row: “Hello, dbt!”
Testing dbt Models
You can test your models to catch bad data early. Create a schema.yml file:
# models/schema.yml version: 2 models: – name: hello columns: – name: message tests: – not_null |
Now run:
dbt test |
It checks that the message column in your model doesn’t have any nulls.
Best Practices and Strategies
● Use folder structures like staging/, intermediate/, marts/ to keep things tidy
● Always use ref() to reference models (don’t hardcode names)
● Write tests for important columns like IDs, timestamps, and metrics
● Document your models and use dbt docs to share with your team
● Start simple, then scale your project as needed
Handy dbt CLI Commands
dbt init | Creates a new dbt project |
dbt debug | Test connection to your warehouse and check project setup |
dbt deps | Install dependencies from packages.yml |
dbt run | Run all models in the project |
dbt run –select tag:tag_name | Run models with a specific tag |
dbt test | Run all tests defined in schema.yml |
dbt test –select model_name | Test specific model |
dbt seed | Load CSV seed files into your warehouse |
dbt snapshot | Run snapshot logic for slowly changing dimensions |
dbt docs generate | Generate documentation files |
dbt docs serve | Start a local server to view docs and lineage graph |
dbt build | Run models + tests + docs generation in one command |
dbt clean | Remove dbt-generated artifacts (like compiled files) |
dbt list | List all resources (models, tests, snapshots, etc.) |
dbt compile | Compile dbt models to raw SQL |
dbt run-operation macro_name | Run a macro from the CLI |
dbt source freshness | Check freshness of your sources |
dbt ls –resource-type model | List all models in the project |
Conclusion
Getting started with dbt Core gives you a structured, scalable way to build reliable data pipelines, especially when your data lives in a modern warehouse like BigQuery, Snowflake, or Redshift. But transformation is just one part of the puzzle.
If you’re looking for a simpler, faster way to get started with dbt Core and build transformation pipelines without the usual setup hassle, try Hevo Transformer. It natively integrates with dbt Core, allowing you to write, schedule, and orchestrate transformations directly within your data pipelines, without switching tools or requiring additional infrastructure.
With Hevo Transformer, you get the best of both worlds: the flexibility of dbt and the simplicity of Hevo’s no-code platform.
Start transforming data smarter, not harder.