Building Snowflake CI CD Pipeline using Azure DevOps & FlyWay Simplified: 4 Critical Steps

By: Published: January 27, 2022

Snowflake CI CD Pipeline | Hevo Data

The acronyms CI and CD are frequently used in modern development practices and DevOps. Continuous integration (CI) refers to a fundamental DevOps best practice in which developers frequently merge code changes into a central repository where automated builds and tests are run. On the other hand, Continuous Delivery (CD) is an extension of Continuous Integration as it automatically deploys all code changes to the testing and/or production environment following the build stage.

Upon a complete walk-through of this article, you will gain a decent understanding of Snowflake along with the key features that it offers. This article will also provide you with a step-by-step guide on how to build a Snowflake CI CD Pipeline in a seamless manner. Read along to learn more about Snowflake CI CD Pipelines.

Table of Contents

Prerequisites

  • Hands-on experience with Git.
  • An active Snowflake account.
  • An active Azure DevOps Services account.

What is Snowflake?

Snowflake CI CD: Snowflake Logo | Hevo Data
Image Source

Snowflake is one of the most popular Cloud Data Warehouses that offers a plethora of features without compromising simplicity. It scales automatically, both up and down, to offer the best Performance-to-Cost ratio. The distinguishing feature of Snowflake is that it separates Computing from Storage. This is significant as almost every other Data Warehouse, including Amazon Redshift, combines the two, implying that you must consider the size of your highest workload and then incur the costs associated with it. 

Snowflake requires no hardware or software to be Chosen, Installed, Configured, or Managed, making it ideal for organizations that do not want to dedicate resources to the Setup, Maintenance, and Support of In-house Servers. It allows you to store all of your data in a centralized location and size your Compute independently. For example, if you require real-time data loads for complex transformations but only have a few complex queries in your reporting, you can script a massive Snowflake Warehouse for the data load and then scale it back down after it’s finished – all in real-time. This will save you a significant amount of money without jeopardizing your solution goals.

Key Features of Snowflake

Some of the key features of Snowflake are as follows:

  • Scalability: The Compute and Storage resources are separated in Snowflakes’ Multi-Cluster Shared Data Architecture. This strategy gives users the ability to scale up resources when large amounts of data is required to be loaded quickly and scale back down when the process is complete without disrupting any kind of operation.
  • No Administration Required: It enables businesses to set up and manage a solution without requiring extensive involvement from Database Administrators or IT teams. It does not necessitate the installation of software or the commissioning of hardware.
  • Security: Snowflake houses a wide range of security features, from how users access Snowflake to how the data is stored. To restrict access to your account, you can manage Network Policies by whitelisting IP addresses. Snowflake supports a variety of authentication methods, including Two-Factor Authentication and SSO via Federated Authentication.
  • Support for Semi-Structured Data: Snowflake’s architecture enables the storage of Structured and Semi-Structured data in the same location by utilizing the VARIANT schema on the Read data type. VARIANT can store both Structured and Semi-structured data. Once the data is loaded, Snowflake automatically parses it, extracts the attributes out of it, and stores it in a Columnar Format.

What is a CI/CD Pipeline?

A CI/CD Pipeline is a set of procedures that must be followed in order to deliver a new version of the software. Continuous Integration/Continuous Delivery (CI/CD) Pipelines are a set of practices aimed at improving software delivery through the use of either a DevOps or a Site Reliability Engineering (SRE) approach. A CI/CD Pipeline incorporates monitoring and automation to improve the Application Development process, particularly during the Integration and Testing phases, as well as during Delivery and Deployment. Although each step of a CI/CD Pipeline can be performed manually, the true value of a CI/CD Pipeline is realized through automation.

Many software development teams are geographically dispersed or isolated, but Continuous Integration (CI) enables rapid development while avoiding Merge Conflicts, Bugs, and Duplication. Continuous Integration always keeps the main branch up to date, but it can also allow for short-term isolated side or feature branches for minor changes that can eventually be merged into the main branch.

Continuous Delivery enables rapid, incremental development and allows development teams to build and release software at any time. It also assists DevOps teams in lowering costs and increasing the speed with which new releases are deployed. Continuous Delivery necessitates a highly repeatable structure and is frequently regarded as an extension of Continuous Integration. Later in this article, you will learn how to build a Snowflake CI CD Pipeline.

What is Azure DevOps?

Azure DevOps is a Software as a Service (SaaS) platform offered by Microsoft that provides an end-to-end DevOps toolchain for developing and deploying software. It also integrates with the majority of the market’s leading tools and is an excellent choice for orchestrating a DevOps toolchain. Azure DevOps offers developer services that enable teams to plan their work, collaborate on code development, and build and deploy applications. Azure DevOps fosters a culture and set of procedures that bring together developers, project managers, and contributors to collaborate on software development. It enables organizations to create and improve products at a much faster rate than traditional software development approaches allow.

What is FlyWay?

Snowflake CI CD: Flyway Logo | Hevo Data
Image Source

FlyWay is an Open-Source tool licensed under Apache License 2.0 that enables users to implement automated and version-based Database Migrations. It enables you to define the necessary update operations in a SQL script or Java code. You can run the database migration from a Command Line Client, as part of your build process, or as part of your Java application.

The key advantage of this process is that FlyWay detects and executes the necessary update operations. As a result, you don’t need to know which SQL update statements must be executed in order to update your current database. You and your colleagues can simply define the update operations that will be used to migrate the database from one version to the next and FlyWay will detect the current version and execute the necessary update operations to update the database.

Reliably Integrate Data with Hevo’s Fully Automated No Code Data Pipeline

If yours anything like the 1000+ data-driven companies that use Hevo, more than 70% of the business apps you use are SaaS applications. Integrating the data from these sources in a timely way is crucial to fuel analytics and the decisions that are taken from it. But given how fast API endpoints etc can change, creating and managing these pipelines can be a soul-sucking exercise.
Hevo’s no-code data pipeline platform lets you connect over 150+ sources in a matter of minutes to deliver data in near real-time to your warehouse such as Snowflake. What’s more, the in-built transformation capabilities and the intuitive UI means even non-engineers can set up pipelines and achieve analytics-ready data in minutes. 

All of this combined with transparent pricing and 24×7 support makes us the most loved data pipeline software in terms of user reviews.

Take our 14-day free trial to experience a better way to manage data pipelines.

Get started for Free with Hevo!

How to Build a Snowflake CI/CD Pipeline using Azure DevOps and Flyway?

Building a Snowflake CI CD Pipeline is broadly a 4-step process. Follow the steps given below to run and deeply a Snowflake CI CD Pipeline:

Step 1: Create a Demo Project

The first step involved in building a Snowflake CI CD pipeline requires you to create a demo Azure DevOps project. Follow the steps given below to do so:

  • Create databases and a user by leveraging the following script:
-- Create Databases
CREATE DATABASE FLYWAY_DEMO COMMENT = 'Azure DevOps deployment test';
CREATE DATABASE FLYWAY_DEMO_DEV COMMENT = 'Azure DevOps deployment test';
CREATE DATABASE FLYWAY_DEMO_QA COMMENT = 'Azure DevOps deployment test';

-- Create a Deploy User
create user devopsuser password='<mypassword>' default_role = sysadmin;
  • Sign In to your Azure DevOps account using the appropriate credentials.
  • Choose the Organization and click on the Blue-colored +New Project button.
Snowflake CI CD: Creating new Project | Hevo Data
Image Source
  • Give a unique and concise name to your project. You can also add a description for it. Let’s name the project as Snowflake_Flyway for the sake of this tutorial.
  • Now, select the Visibility option for your project and click on the Create button.
Snowflake CI CD: Azure DevOps project | Hevo Data
Image Source

Step 2: Set up the Production Environment

You must have an Environment in order to add the Approval step. Follow the steps given below to create the necessary Environments and Approvals:

  • Head back to the Azure DevOps home page.
  • Navigate to the left-side navigation bar and click on the Environments option.
  • Give a unique name to the Production Environment and click on the Create button.
Snowflake CI CD: Production Environment| Hevo Data
Image Source
  • To create Approval for the Production Environment, click on the three vertical dots located next to the Add Resource button.
  • Click on the Approvals and Checks option to add a list of Approvers.
Snowflake CI CD: Adding Resources| Hevo Data
Image Source

Step 3: Create a Library Variable Group

When you have a set of variables that will be used in multiple pipelines, you can create a Variable Group once and reference it in multiple groups. Libraries are used to securely store variables and files that will be used in your Snowflake CI CD pipeline. Follow the steps given below to create a Library Variable Group:

  • In the left navigation bar, click on Library present under the Pipelines option.
  • On the Library page, navigate to the Variable Groups tab.
  • Click on the +Variable Group button to create a new Library Variable Group.
Snowflake CI CD: Creating Variable Group | Hevo Data
Image Source
  • Give a unique name to the group and add the following variables to it.
SNOWFLAKE_JDBC_URL=jdbc:snowflake://
SNOWFLAKE_ACCOUNT_NAME=<account_name>.<region.cloud_platform>.snowflakecomputing.com
SNOWFLAKE_WAREHOUSE=
SNOWFLAKE_ROLENAME=sysadmin
SNOWFLAKE_DEVOPS_USERNAME=<DeployUserName>
# mark as a secret variable type
SNOWFLAKE_DEVOPS_SECRET=<DeployUserPassword>
SNOWFLAKE_AUTHENTICATOR=snowflake
  • Once you have successfully added all of the variables, do not forget to click on the Save button to the right of the Variable Group’s name. This is how your Variable Group would look like:
Snowflake CI CD: Library Variable Group | Hevo Data
Image Source

Step 4: Create and Run a Snowflake CI CD Deployment Pipeline

Now, to create a Snowflake CI CD Pipeline, follow the steps given below:

  • In the left navigation bar, click on the Pipelines option.
  • If you are creating a pipeline for the first time, click on the Create Pipeline button. In case, you already have another pipeline defined then click on the New Pipeline button.
  • On the Connect tab, select the Azure Repos Git option, and select the desired repository(Snowflake_Flyway) on the next screen.
Snowflake CI CD: Azure Repos Git | Hevo Data
Image Source
  • On the Configure your Pipeline page, select the Starter Pipeline option.
Snowflake CI CD: Configuring the Snowflake CI CD Pipeline | Hevo Data
Image Source
  • Lastly, paste the following piece of code into the Review your Final YAML page.
variables:
- group: Snowflake.Database
- name: DBNAME
  value: flyway_demo
- name: flywayartifactName
  value: DatabaseArtifacts  
- name: flywayVmImage
  value: 'ubuntu-16.04'  
- name: flywayContainerImage
  value: 'kulmam92/flyway-azure:6.2.3'  
trigger:
- master

stages:
- stage: Build
  variables:
  - name: DBNAME_POSTFIX
    value: _DEV
  jobs:
  - template: templates/snowflakeFlywayBuild.yml
    parameters:
      jobName: 'BuildDatabase'
      databaseName: $(DBNAME)
      databasePostfix: $(DBNAME_POSTFIX)
      artifactName: $(flywayartifactName)
      vmImage: $(flywayVmImage)
      containerImage: $(flywayContainerImage)

- stage: DEV
  variables:
  - name: DBNAME_POSTFIX
    value: _DEV
  jobs:
  - template: templates/snowflakeFlywayDeploy.yml
    parameters:
      jobName: DEV
      databaseName: $(DBNAME)
      databasePostfix: $(DBNAME_POSTFIX)
      artifactName: $(flywayartifactName)
      vmImage: $(flywayVmImage)
      containerImage: $(flywayContainerImage)
      environmentName: DEV

- stage: QA
  variables:
  - name: DBNAME_POSTFIX
    value: _QA
  jobs:
  - template: templates/snowflakeFlywayDeploy.yml
    parameters:
      jobName: QA
      databaseName: $(DBNAME)
      databasePostfix: $(DBNAME_POSTFIX)
      artifactName: $(flywayartifactName)
      vmImage: $(flywayVmImage)
      containerImage: $(flywayContainerImage)
      environmentName: QA

- stage: PROD
  variables:
  - name: DBNAME_POSTFIX
    value: '' # Empty string for PROD
  jobs:
  - template: templates/snowflakeFlywayDeploy.yml
    parameters:
      jobName: PROD
      databaseName: $(DBNAME)
      databasePostfix: $(DBNAME_POSTFIX)
      artifactName: $(flywayartifactName)
      vmImage: $(flywayVmImage)
      containerImage: $(flywayContainerImage)
      environmentName: PROD
  • Once you have successfully added the code to the editor, click on the Save and Run button.
Snowflake CI CD: Save and Run the Pipeline| Hevo Data
Image Source

Once you follow all the steps explained above in the correct sequence, you will be able to build a Snowflake CI CD Pipeline from scratch!

Conclusion

This blog introduced you to Snowflake along with the salient features that it offers. Furthermore, it introduced you to the steps required to build a Snowflake CI CD Pipeline from scratch using Azure DevOps and Flyway.

As your business begins to grow, data is generated at an exponential rate across all of your company’s SaaS applications, Databases, and other sources. To meet this growing storage and computing needs of data,  you would require to invest a portion of your Engineering Bandwidth to Integrate data from all sources, Clean & Transform it, and finally load it to a Cloud Data Warehouse such as Snowflake for further Business Analytics. All of these challenges can be efficiently handled by a Cloud-Based ETL tool such as Hevo Data.

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations such as Snowflake, with a few clicks. Hevo Data with its strong integration with 100+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

Share your experience of building a Snowflake CI CD Pipeline in the comments below!

mm
Former Research Analyst, Hevo Data

Rakesh is a Cloud Engineer with a passion for data, software architecture, and writing technical content. He has experience writing articles on various topics related to data integration and infrastructure.

No Code Data Pipeline For Snowflake