Deploying data models for transformations can be complex and time-consuming. To ensure that the deployment process is reliable, repeatable, and scalable, many organizations rely on continuous integration and continuous delivery (CI/CD) pipelines. dbt, a popular data transformation tool, is designed to work with Git and can be integrated with your organization’s CI/CD pipeline to streamline the deployment process.
In this article, we will explore how to use both dbt Cloud and dbt GitHub Actions to deploy dbt projects to production.
Using dbt Cloud for Project Deployment in Production
There are multiple ways to configure, schedule, and run your dbt models.
- The most direct approach would be taking up a compute instance and installing the right packages. After that, you can execute a cron job to perform a git pull, and dbt run on a schedule. Since dbt only needs to compile your queries and send data to the data warehouse, it doesn’t need many resources. Also, your compute instance often sits idle as it only runs from a few minutes to less than an hour daily.
- For businesses that already have Airflow in their infrastructure, you can try using it to write a DAG to trigger dbt.
- You go for the open-source tool dbt Core and use dbt GitHub Actions to schedule your runs which will be discussed in the next section.
- Or, instead of dbt GitHub Actions, you can go for the dbt Cloud, which offers more flexibility for configuring your dbt runs.
For deploying your dbt models using dbt Cloud, you can follow the steps given below:
- Step 1: Sign in to dbt Cloud and create a new project by clicking on the “New Project” button in the top-right corner of the dashboard.
Image Source
- Step 2: You will be directed to name your project, enter your database/data warehouse details to start the connection, and finally set up a dbt Cloud-managed repository or directly connect to a supported git provider.
Image Source
- Step 3: Click on the Develop button present on the upper left side of your screen to get started on the development process. You can use dbt CLI or dbt Cloud to create and test your dbt models. You can run dbt commands and view logs and job history in the dbt Cloud UI.
- Step 4: Once your models are ready to deploy, click on the “Deploy” button in the dbt Cloud UI to deploy your project to production. You can choose to deploy individual models or the entire project.
Using dbt GitHub Actions for Project Deployment in Production
dbt is a tool that works effectively with Git and is ideal for integrating with your CI/CD pipeline. dbt GitHub Actions provides a simple way to automate your software workflows and the various stages of your build, test, and deployment process. GitHub also provides virtual machines to run these workflows, making it easy to get started, and offers a free tier. Here, you are going to learn how dbt GitHub Actions can be used to configure a dbt model run on a schedule using CRON. To do that,
- Step 1: If you don’t have a GitHub repository and a dbt project already set up, then login to your GitHub account and click on the Create repository option. Enter the following command in the terminal window to create your new dbt project.
$ dbt init <project_name>
- Step 2: You can connect dbt to your data warehouse, database, or data lake using a profile, which is a YAML file and contains the connection details for your data platform. Go to the profiles.yml file in the root of your dbt project or create one if it doesn’t exist already. Enter the connection details for your data platform in the profiles.yml file. For instance, the following dbt GitHub Actions code considers Snowflake. You can make the necessary changes for other data platforms like Google BigQuery, Amazon Redshift, and Databricks accordingly.
default:
outputs:
dev:
type: snowflake
threads: 1
account: "{{ env_var('DBT_SNOWFLAKE_ACCOUNT') }}"
user: "{{ env_var('DBT_SNOWFLAKE_USERNAME') }}"
role: "{{ env_var('DBT_SNOWFLAKE_ROLE') }}"
password: "{{ env_var('DBT_SNOWFLAKE_PW') }}"
database: "{{ env_var('DBT_SNOWFLAKE_DATABASE') }}"
warehouse: "{{ env_var('DBT_SNOWFLAKE_WAREHOUSE') }}"
schema: "{{ env_var('DBT_SNOWFLAKE_SCHEMA') }}"
client_session_keep_alive: False
query_tag: github_action_query
- Step 3: Before scheduling a dbt GitHub run, you have to set up credentials for the GitHub Actions service to authenticate access to your data warehouse. It is a good practice to create a dedicated credential for this instead of using your own account. That way, if you lose access to your account for some reason, the service will still run.
- Step 4: If not already present, create a .github/workflows/ in your repository. Inside the folder, create a YAML file for scheduling your run using dbt GitHub Actions. For instance, here, a file “schedule_dbt_run.yml” is created with the following code,
name: schedule_dbt_run
on:
schedule:
# run at 7AM every single day
# https://crontab.guru <-- for generating CRON expression
- cron: "0 7 * * *"
push:
branches:
# run on push to development branch
- development
env:
DBT_PROFILES_DIR: ./
DBT_SNOWFLAKE_USERNAME: ${{ secrets.DBT_SNOWFLAKE_USERNAME }}
DBT_SNOWFLAKE_PW: ${{ secrets.DBT_SNOWFLAKE_PW }}
DBT_SNOWFLAKE_ROLE: ${{ secrets.DBT_SNOWFLAKE_ROLE }}
jobs:
schedule_dbt_run:
name: schedule_dbt_run
runs-on: ubuntu-latest
steps:
- name: Check out
uses: actions/checkout@master
- uses: actions/setup-python@v1
with:
python-version: "3.7.x"
- name: Install dependencies
run: |
pip install dbt
dbt deps
# dbt related commands here - run use --target prod/dev to run for specific environments
- name: Run dbt models
run: dbt run
- name: Test dbt models
run: dbt test
The above workflow is triggered using a cron schedule, which is meant to run at 7 AM every single day.
As you can notice here, this code will run on a development environment instead of a production environment. This ensures that any changes are verified, reviewed, and tested before deploying to the production environment.
To deploy it to the production environment, you can replace the commands for dependencies, run and test with:
-name: Install dependencies
run: |
pip install dbt
dbt deps --target prod
# Add dbt seed or other commands here if needed
- name: Run dbt models
run: dbt run --target prod
- name: Test dbt models
run: dbt test --target prod
Like many development teams, you can also create pull requests every time there is a change to the code base. You achieve this by simply making the following change:
push:
branches:
- main
- Step 5: Finally, remember to configure the environment variables in your GitHub repository, as every dbt GitHub Action will refer to these variables during a run. To do that, you can go to Settings > Secrets > New Repository Secret and add all the variables specified in the YAML above. For instance, the Snowflake Account variable would be ‘DBT_SNOWFLAKE_ACCOUNT’.
Image Source
Key Takeaways
With GitHub offering sufficient free quota, you should be successfully able to deploy projects in production using dbt GitHub Actions. As your dbt models run automatically, you can rest assured about getting data timely in an analysis-ready form. However, there is still a lot of manual pipeline building and monitoring required in terms of extracting and loading data from multiple sources.
For cases when you rarely need to replicate data, your engineering team can easily do it. For frequent and massive volumes of data transfers from multiple sources, your engineering team would need to monitor and fix any data leaks constantly. Or you can simply hop onto a smooth ride with cloud-based ELT solutions like Hevo Data which automates the data integration process for you and runs your dbt projects to transform data present in your data warehouse. At this time, the dbt Core™ on Hevo is in BETA. Please reach out to Hevo Support or your account executive to enable it for your team.
Visit our Website to Explore Hevo
Offering 150+ plug-and-play integrations and saving countless hours of manual data cleaning & standardizing, Hevo Data also offers in-built pre-load data transformations that get it done in minutes via a simple drag-n-drop interface or your custom python scripts.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and simplify your data integration process. Check out the pricing details to understand which plan fulfills all your business needs.
Share your experience of learning about deploying projects using dbt GitHub Actions! Let us know in the comments section below!