The acronyms CI and CD are frequently used in modern development practices and DevOps. Continuous integration (CI) refers to a fundamental DevOps best practice in which developers frequently merge code changes into a central repository where automated builds and tests are run. On the other hand, Continuous Delivery (CD) is an extension of Continuous Integration as it automatically deploys all code changes to the testing and/or production environment following the build stage.
Upon a complete walk-through of this article, you will gain a decent understanding of Snowflake along with the key features that it offers. This article will also provide you with a step-by-step guide on how to build a Snowflake CI CD Pipeline in a seamless manner. Read along to learn more about Snowflake CI CD Pipeline.
Prerequisites
- Hands-on experience with Git.
- An active Snowflake account.
- An active Azure DevOps Services account.
What is Snowflake?
Snowflake is one of the most popular Cloud Data Warehouses that offers a plethora of features without compromising simplicity. It scales automatically, both up and down, to offer the best Performance-to-Cost ratio. The distinguishing feature of Snowflake is that it separates Computing from Storage. This is significant as almost every other Data Warehouse, including Amazon Redshift, combines the two, implying that you must consider the size of your highest workload and then incur the costs associated with it.
Snowflake requires no hardware or software to be Chosen, Installed, Configured, or Managed, making it ideal for organizations that do not want to dedicate resources to the Setup, Maintenance, and Support of In-house Servers. It allows you to store all of your data in a centralized location and size your Compute independently.
Key Features of Snowflake
Some of the key features of Snowflake are as follows:
- Scalability: The Compute and Storage resources are separated in Snowflakes’ Multi-Cluster Shared Data Architecture. This strategy gives users the ability to scale up resources when large amounts of data is required to be loaded quickly and scale back down when the process is complete without disrupting any kind of operation.
- No Administration Required: It enables businesses to set up and manage a solution without requiring extensive involvement from Database Administrators or IT teams. It does not necessitate the installation of software or the commissioning of hardware.
- Security: Snowflake houses a wide range of security features, from how users access Snowflake to how the data is stored. To restrict access to your account, you can manage Network Policies by whitelisting IP addresses. Snowflake supports a variety of authentication methods, including Two-Factor Authentication and SSO via Federated Authentication.
- Support for Semi-Structured Data: Snowflake’s architecture enables the storage of Structured and Semi-Structured data in the same location by utilizing the VARIANT schema on the Read data type. VARIANT can store both Structured and Semi-structured data. Once the data is loaded, Snowflake automatically parses it, extracts the attributes out of it, and stores it in a Columnar Format.
Hevo Data is now available on Snowflake Partner Connect, making it easier than ever to integrate your data seamlessly. With Hevo’s powerful data integration capabilities, Snowflake users can connect to Hevo directly from their Snowflake environment and streamline their data pipelines effortlessly. Hevo offers:
- More than 150 source connectors from databases, SaaS applications, etc.
- A simple Python-based drag-and-drop data transformation technique that allows you to transform your data for analysis.
- Automatic schema mapping to match the destination schema with the incoming data. You can also choose between Full and Incremental Mapping.
- Transparent pricing with no hidden fees allows you to budget effectively while scaling your data integration needs.
Hevo has been rated 4.7/5 on Capterra. Know more about our 2000+ customers and give us a try.
Get Started with Hevo for Free
What is a CI/CD Pipeline?
A CI/CD Pipeline is a set of procedures that must be followed in order to deliver a new version of the software. Continuous Integration/Continuous Delivery (CI/CD) Pipelines are a set of practices aimed at improving software delivery through the use of either a DevOps or a Site Reliability Engineering (SRE) approach. A CI/CD Pipeline incorporates monitoring and automation to improve the Application Development process, particularly during the Integration and Testing phases, as well as during Delivery and Deployment. Although each step of a CI/CD Pipeline can be performed manually, the true value of a CI/CD Pipeline is realized through automation.
Many software development teams are geographically dispersed or isolated, but Continuous Integration (CI) enables rapid development while avoiding Merge Conflicts, Bugs, and Duplication. Continuous Integration always keeps the main branch up to date, but it can also allow for short-term isolated side or feature branches for minor changes that can eventually be merged into the main branch.
Continuous Delivery enables rapid, incremental development and allows development teams to build and release software at any time. It also assists DevOps teams in lowering costs and increasing the speed with which new releases are deployed. Continuous Delivery necessitates a highly repeatable structure and is frequently regarded as an extension of Continuous Integration. Later in this article, you will learn how to build a Snowflake CI CD Pipeline.
What is Azure DevOps?
Azure DevOps is a Software as a Service (SaaS) platform offered by Microsoft that provides an end-to-end DevOps toolchain for developing and deploying software. It also integrates with the majority of the market’s leading tools and is an excellent choice for orchestrating a DevOps toolchain. Azure DevOps offers developer services that enable teams to plan their work, collaborate on code development, and build and deploy applications. Azure DevOps fosters a culture and set of procedures that bring together developers, project managers, and contributors to collaborate on software development. It enables organizations to create and improve products at a much faster rate than traditional software development approaches allow.
What is FlyWay?
FlyWay is an Open-Source tool licensed under Apache License 2.0 that enables users to implement automated and version-based Database Migrations. It enables you to define the necessary update operations in a SQL script or Java code. You can run the database migration from a Command Line Client, as part of your build process, or as part of your Java application.
The key advantage of this process is that FlyWay detects and executes the necessary update operations. As a result, you don’t need to know which SQL update statements must be executed in order to update your current database. You and your colleagues can simply define the update operations that will be used to migrate the database from one version to the next and FlyWay will detect the current version and execute the necessary update operations to update the database.
Integrate Amazon DocumentDB to Snowflake
Integrate MongoDB to Snowflake
Integrate Salesforce to Snowflake
How to Build a Snowflake CI/CD Pipeline using Azure DevOps and Flyway?
Building a Snowflake CI CD Pipeline is broadly a 4-step process. Follow the steps given below to run and deeply a Snowflake CI CD Pipeline:
Step 1: Create a Demo Project
The first step involved in building a Snowflake CI CD pipeline requires you to create a demo Azure DevOps project. Follow the steps given below to do so:
- Create databases and a user by leveraging the following script:
-- Create Databases
CREATE DATABASE FLYWAY_DEMO COMMENT = 'Azure DevOps deployment test';
CREATE DATABASE FLYWAY_DEMO_DEV COMMENT = 'Azure DevOps deployment test';
CREATE DATABASE FLYWAY_DEMO_QA COMMENT = 'Azure DevOps deployment test';
-- Create a Deploy User
create user devopsuser password='<mypassword>' default_role = sysadmin;
- Sign In to your Azure DevOps account using the appropriate credentials.
- Choose the Organization and click on the Blue-colored +New Project button.
- Give a unique and concise name to your project. You can also add a description for it. Let’s name the project as Snowflake_Flyway for the sake of this tutorial.
- Now, select the Visibility option for your project and click on the Create button.
Step 2: Set up the Production Environment
You must have an Environment in order to add the Approval step. Follow the steps given below to create the necessary Environments and Approvals:
- Head back to the Azure DevOps home page.
- Navigate to the left-side navigation bar and click on the Environments option.
- Give a unique name to the Production Environment and click on the Create button.
- To create Approval for the Production Environment, click on the three vertical dots located next to the Add Resource button.
- Click on the Approvals and Checks option to add a list of Approvers.
Step 3: Create a Library Variable Group
When you have a set of variables that will be used in multiple pipelines, you can create a Variable Group once and reference it in multiple groups. Libraries are used to securely store variables and files that will be used in your Snowflake CI CD pipeline. Follow the steps given below to create a Library Variable Group:
- In the left navigation bar, click on Library present under the Pipelines option.
- On the Library page, navigate to the Variable Groups tab.
- Click on the +Variable Group button to create a new Library Variable Group.
- Give a unique name to the group and add the following variables to it.
SNOWFLAKE_JDBC_URL=jdbc:snowflake://
SNOWFLAKE_ACCOUNT_NAME=<account_name>.<region.cloud_platform>.snowflakecomputing.com
SNOWFLAKE_WAREHOUSE=
SNOWFLAKE_ROLENAME=sysadmin
SNOWFLAKE_DEVOPS_USERNAME=<DeployUserName>
# mark as a secret variable type
SNOWFLAKE_DEVOPS_SECRET=<DeployUserPassword>
SNOWFLAKE_AUTHENTICATOR=snowflake
- Once you have successfully added all of the variables, do not forget to click on the Save button to the right of the Variable Group’s name. This is how your Variable Group would look like:
Step 4: Create and Run a Snowflake CI CD Deployment Pipeline
Now, to create a Snowflake CI CD Pipeline, follow the steps given below:
- In the left navigation bar, click on the Pipelines option.
- If you are creating a pipeline for the first time, click on the Create Pipeline button. In case, you already have another pipeline defined then click on the New Pipeline button.
- On the Connect tab, select the Azure Repos Git option, and select the desired repository(Snowflake_Flyway) on the next screen.
- On the Configure your Pipeline page, select the Starter Pipeline option.
- Lastly, paste the following piece of code into the Review your Final YAML page.
variables:
- group: Snowflake.Database
- name: DBNAME
value: flyway_demo
- name: flywayartifactName
value: DatabaseArtifacts
- name: flywayVmImage
value: 'ubuntu-16.04'
- name: flywayContainerImage
value: 'kulmam92/flyway-azure:6.2.3'
trigger:
- master
stages:
- stage: Build
variables:
- name: DBNAME_POSTFIX
value: _DEV
jobs:
- template: templates/snowflakeFlywayBuild.yml
parameters:
jobName: 'BuildDatabase'
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
- stage: DEV
variables:
- name: DBNAME_POSTFIX
value: _DEV
jobs:
- template: templates/snowflakeFlywayDeploy.yml
parameters:
jobName: DEV
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
environmentName: DEV
- stage: QA
variables:
- name: DBNAME_POSTFIX
value: _QA
jobs:
- template: templates/snowflakeFlywayDeploy.yml
parameters:
jobName: QA
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
environmentName: QA
- stage: PROD
variables:
- name: DBNAME_POSTFIX
value: '' # Empty string for PROD
jobs:
- template: templates/snowflakeFlywayDeploy.yml
parameters:
jobName: PROD
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
environmentName: PROD
- Once you have successfully added the code to the editor, click on the Save and Run button.
Once you follow all the steps explained above in the correct sequence, you will be able to build a Snowflake CI CD Pipeline from scratch!
Migrate Data Seamlessly to Snowflake with Hevo
No credit card required
Conclusion
This blog introduced you to Snowflake along with the salient features that it offers. Furthermore, it introduced you to the steps required to build a Snowflake CI CD Pipeline from scratch using Azure DevOps and Flyway.
As your business begins to grow, data is generated at an exponential rate across all of your company’s SaaS applications, Databases, and other sources. To meet this growing storage and computing needs of data, you would be required to invest a portion of your Engineering Bandwidth to Integrate data from all sources, Clean & Transform it, and finally load it to a Cloud Data Warehouse such as Snowflake for further Business Analytics. All of these challenges can be efficiently handled by a Cloud-Based ETL tool such as Hevo Data.
Hevo Data, a No-code Data Pipeline, provides a consistent and reliable solution for managing data transfer between various sources and Desired Destinations, such as Snowflake, with a few clicks. Sign up for Hevo’s 14-day free trial and experience seamless data migration.
FAQs
What is Snowflake CI/CD?
Snowflake CI/CD refers to integrating Snowflake with continuous integration and continuous delivery pipelines, enabling automated deployment of data models, transformations, and database changes to enhance data workflows.
Is Snowflake a DevOps tool?
Snowflake is a cloud-based data warehousing platform, but it can be integrated into DevOps practices for data management and analytics.
Why use Snowflake instead of AWS?
Snowflake offers better performance, scalability, and ease of use for data warehousing compared to AWS services, with features like automatic scaling and a multi-cloud strategy.
Rakesh is a research analyst at Hevo Data with more than three years of experience in the field. He specializes in technologies, including API integration and machine learning. The combination of technical skills and a flair for writing brought him to the field of writing on highly complex topics. He has written numerous articles on a variety of data engineering topics, such as data integration, data analytics, and data management. He enjoys simplifying difficult subjects to help data practitioners with their doubts related to data engineering.