The acronyms CI and CD are frequently used in modern development practices and DevOps. Continuous integration (CI) refers to a fundamental DevOps best practice in which developers frequently merge code changes into a central repository where automated builds and tests are run. On the other hand, Continuous Delivery (CD) is an extension of Continuous Integration as it automatically deploys all code changes to the testing and/or production environment following the build stage.
Upon a complete walk-through of this article, you will gain a decent understanding of Snowflake along with the key features that it offers. This article will also provide you with a step-by-step guide on how to build a Snowflake CI CD Pipeline in a seamless manner. Read along to learn more about Snowflake CI CD Pipeline.
Prerequisites
- Hands-on experience with Git.
- An active Snowflake account.
- An active Azure DevOps Services account.
What is Snowflake?
Snowflake is one of the most popular Cloud Data Warehouses that offers a plethora of features without compromising simplicity. It scales automatically, both up and down, to offer the best Performance-to-Cost ratio. The distinguishing feature of Snowflake is that it separates Computing from Storage. This is significant as almost every other Data Warehouse, including Amazon Redshift, combines the two, implying that you must consider the size of your highest workload and then incur the costs associated with it.
Snowflake requires no hardware or software to be Chosen, Installed, Configured, or Managed, making it ideal for organizations that do not want to dedicate resources to the Setup, Maintenance, and Support of In-house Servers. It allows you to store all of your data in a centralized location and size your Compute independently. For example, if you require real-time data loads for complex transformations but only have a few complex queries in your reporting, you can script a massive Snowflake Warehouse for the data load and then scale it back down after it’s finished – all in real-time. This will save you a significant amount of money without jeopardizing your solution goals.
Key Features of Snowflake
Some of the key features of Snowflake are as follows:
- Scalability: The Compute and Storage resources are separated in Snowflakes’ Multi-Cluster Shared Data Architecture. This strategy gives users the ability to scale up resources when large amounts of data is required to be loaded quickly and scale back down when the process is complete without disrupting any kind of operation.
- No Administration Required: It enables businesses to set up and manage a solution without requiring extensive involvement from Database Administrators or IT teams. It does not necessitate the installation of software or the commissioning of hardware.
- Security: Snowflake houses a wide range of security features, from how users access Snowflake to how the data is stored. To restrict access to your account, you can manage Network Policies by whitelisting IP addresses. Snowflake supports a variety of authentication methods, including Two-Factor Authentication and SSO via Federated Authentication.
- Support for Semi-Structured Data: Snowflake’s architecture enables the storage of Structured and Semi-Structured data in the same location by utilizing the VARIANT schema on the Read data type. VARIANT can store both Structured and Semi-structured data. Once the data is loaded, Snowflake automatically parses it, extracts the attributes out of it, and stores it in a Columnar Format.
What is a CI/CD Pipeline?
A CI/CD Pipeline is a set of procedures that must be followed in order to deliver a new version of the software. Continuous Integration/Continuous Delivery (CI/CD) Pipelines are a set of practices aimed at improving software delivery through the use of either a DevOps or a Site Reliability Engineering (SRE) approach. A CI/CD Pipeline incorporates monitoring and automation to improve the Application Development process, particularly during the Integration and Testing phases, as well as during Delivery and Deployment. Although each step of a CI/CD Pipeline can be performed manually, the true value of a CI/CD Pipeline is realized through automation.
Many software development teams are geographically dispersed or isolated, but Continuous Integration (CI) enables rapid development while avoiding Merge Conflicts, Bugs, and Duplication. Continuous Integration always keeps the main branch up to date, but it can also allow for short-term isolated side or feature branches for minor changes that can eventually be merged into the main branch.
Continuous Delivery enables rapid, incremental development and allows development teams to build and release software at any time. It also assists DevOps teams in lowering costs and increasing the speed with which new releases are deployed. Continuous Delivery necessitates a highly repeatable structure and is frequently regarded as an extension of Continuous Integration. Later in this article, you will learn how to build a Snowflake CI CD Pipeline.
What is Azure DevOps?
Azure DevOps is a Software as a Service (SaaS) platform offered by Microsoft that provides an end-to-end DevOps toolchain for developing and deploying software. It also integrates with the majority of the market’s leading tools and is an excellent choice for orchestrating a DevOps toolchain. Azure DevOps offers developer services that enable teams to plan their work, collaborate on code development, and build and deploy applications. Azure DevOps fosters a culture and set of procedures that bring together developers, project managers, and contributors to collaborate on software development. It enables organizations to create and improve products at a much faster rate than traditional software development approaches allow.
What is FlyWay?
FlyWay is an Open-Source tool licensed under Apache License 2.0 that enables users to implement automated and version-based Database Migrations. It enables you to define the necessary update operations in a SQL script or Java code. You can run the database migration from a Command Line Client, as part of your build process, or as part of your Java application.
The key advantage of this process is that FlyWay detects and executes the necessary update operations. As a result, you don’t need to know which SQL update statements must be executed in order to update your current database. You and your colleagues can simply define the update operations that will be used to migrate the database from one version to the next and FlyWay will detect the current version and execute the necessary update operations to update the database.
How to Build a Snowflake CI/CD Pipeline using Azure DevOps and Flyway?
Building a Snowflake CI CD Pipeline is broadly a 4-step process. Follow the steps given below to run and deeply a Snowflake CI CD Pipeline:
Step 1: Create a Demo Project
The first step involved in building a Snowflake CI CD pipeline requires you to create a demo Azure DevOps project. Follow the steps given below to do so:
- Create databases and a user by leveraging the following script:
-- Create Databases
CREATE DATABASE FLYWAY_DEMO COMMENT = 'Azure DevOps deployment test';
CREATE DATABASE FLYWAY_DEMO_DEV COMMENT = 'Azure DevOps deployment test';
CREATE DATABASE FLYWAY_DEMO_QA COMMENT = 'Azure DevOps deployment test';
-- Create a Deploy User
create user devopsuser password='<mypassword>' default_role = sysadmin;
- Sign In to your Azure DevOps account using the appropriate credentials.
- Choose the Organization and click on the Blue-colored +New Project button.
- Give a unique and concise name to your project. You can also add a description for it. Let’s name the project as Snowflake_Flyway for the sake of this tutorial.
- Now, select the Visibility option for your project and click on the Create button.
Step 2: Set up the Production Environment
You must have an Environment in order to add the Approval step. Follow the steps given below to create the necessary Environments and Approvals:
- Head back to the Azure DevOps home page.
- Navigate to the left-side navigation bar and click on the Environments option.
- Give a unique name to the Production Environment and click on the Create button.
- To create Approval for the Production Environment, click on the three vertical dots located next to the Add Resource button.
- Click on the Approvals and Checks option to add a list of Approvers.
Step 3: Create a Library Variable Group
When you have a set of variables that will be used in multiple pipelines, you can create a Variable Group once and reference it in multiple groups. Libraries are used to securely store variables and files that will be used in your Snowflake CI CD pipeline. Follow the steps given below to create a Library Variable Group:
- In the left navigation bar, click on Library present under the Pipelines option.
- On the Library page, navigate to the Variable Groups tab.
- Click on the +Variable Group button to create a new Library Variable Group.
- Give a unique name to the group and add the following variables to it.
SNOWFLAKE_JDBC_URL=jdbc:snowflake://
SNOWFLAKE_ACCOUNT_NAME=<account_name>.<region.cloud_platform>.snowflakecomputing.com
SNOWFLAKE_WAREHOUSE=
SNOWFLAKE_ROLENAME=sysadmin
SNOWFLAKE_DEVOPS_USERNAME=<DeployUserName>
# mark as a secret variable type
SNOWFLAKE_DEVOPS_SECRET=<DeployUserPassword>
SNOWFLAKE_AUTHENTICATOR=snowflake
- Once you have successfully added all of the variables, do not forget to click on the Save button to the right of the Variable Group’s name. This is how your Variable Group would look like:
Step 4: Create and Run a Snowflake CI CD Deployment Pipeline
Now, to create a Snowflake CI CD Pipeline, follow the steps given below:
- In the left navigation bar, click on the Pipelines option.
- If you are creating a pipeline for the first time, click on the Create Pipeline button. In case, you already have another pipeline defined then click on the New Pipeline button.
- On the Connect tab, select the Azure Repos Git option, and select the desired repository(Snowflake_Flyway) on the next screen.
- On the Configure your Pipeline page, select the Starter Pipeline option.
- Lastly, paste the following piece of code into the Review your Final YAML page.
variables:
- group: Snowflake.Database
- name: DBNAME
value: flyway_demo
- name: flywayartifactName
value: DatabaseArtifacts
- name: flywayVmImage
value: 'ubuntu-16.04'
- name: flywayContainerImage
value: 'kulmam92/flyway-azure:6.2.3'
trigger:
- master
stages:
- stage: Build
variables:
- name: DBNAME_POSTFIX
value: _DEV
jobs:
- template: templates/snowflakeFlywayBuild.yml
parameters:
jobName: 'BuildDatabase'
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
- stage: DEV
variables:
- name: DBNAME_POSTFIX
value: _DEV
jobs:
- template: templates/snowflakeFlywayDeploy.yml
parameters:
jobName: DEV
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
environmentName: DEV
- stage: QA
variables:
- name: DBNAME_POSTFIX
value: _QA
jobs:
- template: templates/snowflakeFlywayDeploy.yml
parameters:
jobName: QA
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
environmentName: QA
- stage: PROD
variables:
- name: DBNAME_POSTFIX
value: '' # Empty string for PROD
jobs:
- template: templates/snowflakeFlywayDeploy.yml
parameters:
jobName: PROD
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
environmentName: PROD
- Once you have successfully added the code to the editor, click on the Save and Run button.
Once you follow all the steps explained above in the correct sequence, you will be able to build a Snowflake CI CD Pipeline from scratch!
Conclusion
This blog introduced you to Snowflake along with the salient features that it offers. Furthermore, it introduced you to the steps required to build a Snowflake CI CD Pipeline from scratch using Azure DevOps and Flyway.
As your business begins to grow, data is generated at an exponential rate across all of your company’s SaaS applications, Databases, and other sources. To meet this growing storage and computing needs of data, you would require to invest a portion of your Engineering Bandwidth to Integrate data from all sources, Clean & Transform it, and finally load it to a Cloud Data Warehouse such as Snowflake for further Business Analytics. All of these challenges can be efficiently handled by a Cloud-Based ETL tool such as Hevo Data.
Visit our Website to Explore Hevo
Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations such as Snowflake, with a few clicks. Hevo Data with its strong integration with 150+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.
Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!
Share your experience of building a Snowflake CI CD Pipeline in the comments below!
Rakesh is a research analyst at Hevo Data with more than three years of experience in the field. He specializes in technologies, including API integration and machine learning. The combination of technical skills and a flair for writing brought him to the field of writing on highly complex topics. He has written numerous articles on a variety of data engineering topics, such as data integration, data analytics, and data management. He enjoys simplifying difficult subjects to help data practitioners with their doubts related to data engineering.