Schema management is crucial for ensuring data quality and consistency in a database. One prominent feature it enables is version control and change management. Version control helps maintain the history of schema versions, allowing an efficient way to track the changes made to the schema.
To achieve this, you can use Schemachange, an open-source change management tool. It can enable you to manage metadata changes in DevOps pipelines for a data warehousing platform like Snowflake.
This article discusses the Schema Change Snowflake feature, its best practices, how to set it up, and how to use it with version control tools like GitHub Actions.
An Overview of Schema Change Snowflake
Schema Change Snowflake: Schemachange
A schema migration tool is an essential element that works with version control systems like Git to automate the deployment of SQL code. It ensures efficient and secure management of database deployments, preventing the challenges faced while deploying code. Some of the most common hurdles include:
- Running scripts with wrong roles and privileges.
- Running a part of the script multiple times.
- Forgetting the manual steps involved before and after running the script.
- Only running specific parts of the script.
Multiple schema migration tools, including schemachange in Snowflake, Flyway, and Liquibase, can help you overcome these limitations.
Snowflake change schema feature is a Python-based, community-developed tool for managing Snowflake objects. It works on the principle of database change management (DCM), which employs a set of processes to manage objects within a database.
Schema Change Snowflake combines a version control and CI/CD tools to deploy database changes through a pipeline. Effectively using Schema change can improve data integrity while maintaining consistency across different environments.
How to Create a Change History Table?
The change history table records all the applied change scripts, which helps track and manage the changes made to the database schema. It keeps a log of changes, ensuring changes apply only once.
By default, Snowflake schema change record activities to a table CHANGE_HISTORY located in the METADATA_SCHEMACHANGE schema. To override the default location and the name of the change history table, you can use the -c parameter.
You can use –create-change-history-table to create a new schema and table associated with the change history table. This is how the structure of the new history table will look like:
Schema Change Snowflake: Change History Table
To create a change history table using the data definition language (DDL) command, you can follow the code below:
CREATE TABLE IF NOT EXISTS SCHEMACHANGE.CHANGE_HISTORY
(
VERSION VARCHAR
,DESCRIPTION VARCHAR
,SCRIPT VARCHAR
,SCRIPT_TYPE VARCHAR
,CHECKSUM VARCHAR
,EXECUTION_TIME NUMBER
,STATUS VARCHAR
,INSTALLED_BY VARCHAR
,INSTALLED_ON TIMESTAMP_LTZ
)
How to Structure Repository?
Schema change navigates the root folder, but it does not enforce any particular structure for the organization of files. You can choose the structure of the repository as per your requirements. However, there are some best practices that you can follow to reduce complexity in file structures. One such practice includes creating discrete folders according to the specific content to be saved inside that folder.
For the convenience of finding files, you can create three folders inside the root directory, each containing files with different use cases. Here are the three files that you can create:
- Always File: In this folder, you can keep the content you are experimenting with or want to customize according to certain situations. The files in this directory do not accept defined objects.
- Repeatable: If you want to run scripts in a repeated and scheduled manner, you can use the Repeatable folder to save specific scripts. For this folder, you can follow the database > schema > object type hierarchy.
- Versioned: You can organize this directory into multiple folders to manage different files. However, these files can create a mess, making it difficult to find which version you are on for a project. When organizing files is hard, you can archive files into a separate folder.
Snowflake Schema Slowly Changing Dimensions (SCDs)
There are two different types of tables: fact tables and dimension tables. Fact tables store transactional information, while dimension tables store additional information that enriches the fact table. The fact tables contain data on changes, constantly reflecting your business transactions.
However, the dimension tables only change sometimes when there is an update in the product description or any specific identity of your database.
Slowly changing dimensions (SCDs) is a framework that updates and maintains the data stored in dimension tables as dimensions change. The are multiple SCD types that handle changing dimensions, ranging from SCD type zero to six depending on the particular feature they provide.
How to Setup Schemachange for Snowflake?
There are certain prerequisites that you must satisfy before setting up schema change Snowflake.
Prerequisites
- You must use a Git provider (GitHub, Azure Repos, or Bitbucket)
- A pipeline tool (GitHub Actions, Azure Pipelines, etc.)
- You must create a Snowflake database schema for schemachange change history table.
- A Snowflake service account with permission to create new objects.
Step 1: You can create a directory to host scripts in your repository.
schemachange
├── cicd
├── scripts
├── always
├── repeatable
└── versioned
Step 2: Inside the CI/CD directory, you can create two files for the target environment. The created files will include TST (test) and PRD (production), as DEV will only be available for developers.
schemachange
├── cicd
│ └── schemachange_config
│ ├── tst.yml
│ └── prd.yml
├── scripts
├── always
├── repeatable
└── versioned
Step 3: The tst.yml and prd.yml will be similar but will have different roles, users, and permissions associated with them.
Same for each environment:
config-version: 1
root-folder: 'scripts'
modules-folder: null
snowflake-account: 'XXXXX' # for example: XXXXX.west-us.azure
snowflake-warehouse: 'COMPUTE_WH'
create-change-history-table: false
autocommit: false
verbose: true
Changed within each environment:
snowflake-user: 'SVC_PRD_SCHEMACHANGE'
snowflake-role: 'PRD_SCHEMACHANGE'
change-history-table: 'UTILITY.SCHEMACHANGE.CHANGE_HISTORY'
How to Use Schemachange?
This section highlights the method of building a CI/CD pipeline using GitHub actions and Snowflake schema change.
GitHub actions let you create, test, and deploy your code from GitHub, enabling branch management, code reviews, and issue triaging work according to your needs. But before going through the steps, ensure you satisfy the given prerequisites.
Prerequisites:
- You must have a Snowflake account with a database named ‘DEMO_DB’.
- A Snowflake user with appropriate permissions to create objects inside DEMO_DB.
- You must have a GitHub account with a repository.
- Your preferred IDE with GitHub integration.
- You must clone the project repository to your computer.
After satisfying the prerequisite conditions, follow the steps below.
Step 1: Create First Database Migration
In your cloned repository, create a new folder, ‘migrations’. Inside this folder, you must create a script V1.1.1__initial_objects.sql with the following content:
CREATE SCHEMA DEMO;
CREATE TABLE HELLO_WORLD
(
FIRST_NAME VARCHAR
,LAST_NAME VARCHAR
);
Commit the new script, and after that, push the changes to your GitHub repository.
Step 2: Create Action Secrets
GitHub Action Secrets helps securely store variables that can be used in CI/CD pipelines. To create action secrets:
- Click the Settings tab at the top of the repository page.
- On the Settings page, click on the Secrets tab on the left navigation bar.
- Select the Actions secrets.
- Enter the value of each secret by clicking on New repository secret near the top right corner.
After adding all the secrets, the page you are on should look like this:
Action Secrets
Step 3: Create Action Workflow
Action workflow represents automated pipelines. To create an action workflow:
- Click on the Actions tab in the top middle of the repository page.
- Click on set up a workflow yourself > link.
- A new workflow page will appear; name the workflow snowflake-devops-demo.yml.
- Inside the Edit new file box, enter the following code:
name: snowflake-devops-demo
Here’s the code for the controls when the action will run:
on:
push:
branches:
- main
paths:
- 'migrations/**'
Here’s the code that enables you to run this following workflow manually from the Actions tab:
workflow_dispatch:
jobs:
deploy-snowflake-changes-job:
runs-on: ubuntu-latest
steps:
The following code checks out your repository under $GITHUB_WORKSPACE, so your job can access it:
- name: Checkout repository
uses: actions/checkout@v2
- name: Use Python 3.8.x
uses: actions/setup-python@v2.2.1
with:
python-version: 3.8.x
- name: Run schemachange
env:
SF_ACCOUNT: ${{ secrets.SF_ACCOUNT }}
SF_USERNAME: ${{ secrets.SF_USERNAME }}
SF_ROLE: ${{ secrets.SF_ROLE }}
SF_WAREHOUSE: ${{ secrets.SF_WAREHOUSE }}
SF_DATABASE: ${{ secrets.SF_DATABASE }}
SNOWFLAKE_PASSWORD: ${{ secrets.SF_PASSWORD }}
run: |
echo "GITHUB_WORKSPACE: $GITHUB_WORKSPACE"
python --version
echo "Step 1: Installing schemachange"
pip install schemachange
echo "Step 2: Running schemachange"
schemachange -f $GITHUB_WORKSPACE/migrations -a $SF_ACCOUNT -u $SF_USERNAME -r $SF_ROLE -w $SF_WAREHOUSE -d $SF_DATABASE -c $SF_DATABASE.SCHEMACHANGE.CHANGE_HISTORY --create-change-history-table
After entering the commands, click on the green Start commit button near the top right corner of the page and then click on Commit new file. This step will automatically navigate you to the workflow folder in your repository.
Step 4: Run Actions Workflow
To manually run the actions workflow, follow the steps below.
- Click on the Actions tab near the top of the repository page.
- On the left navigation bar, select the name of the workflow snowflake-devops-demo.
- Click on the green Run workflow button from the Run workflow drop-down option.
Run Workflow
Step 5: Confirm Deployed Changes to Snowflake
The previous steps help you deploy your first database migration. To confirm the process:
- Log in to your Snowflake account.
- Your DEMO_DB database will have a few new options, including two new schemas, DEMO and SCHEMACHNGE, with two new tables, HELLO_WORLD and CHANGE_HISTORY.
- Inside the CHANGE_HISTORY tables, you can see where and how schemachange keeps track of the changes.
- To review the queries executed by schemachange click on the HISTORY tab on your Snowflake account, and navigate to the Query Tag column.
Step 6: Create a Second Database
GitHub Actions can now automate the deployment of SQL code to your Snowflake account. To create a new database:
- Navigate to your cloned repository in your preferred IDE.
- Create a script V1.1.2__updated_objects.sql inside the migrations folder with the command below.
USE SCHEMA DEMO;
ALTER TABLE HELLO_WORLD ADD COLUMN AGE NUMBER;
- Commit the new script and then push the changes to your repository. Your workflow will automatically start a new run.
- Click on the latest run in your workflow and browse through the output of the steps reviewing the Run schemachange by clicking on deploy-snowflake-changes-job.
Challenges of Using Schema Change Snowflake
Dependent Views
Schema change Snowflake cannot run queries in logical order. If the views depend on each other, it can produce errors while running. For example, the command below is error-prone, as the second view references the first view.
--vw_reptiles.sql
CREATE OR REPLACE VIEW VW_REPTILES AS
SELECT *
FROM TABLE_ANIMALS;
--vw_amphibians.sql
CREATE OR REPLACE VIEW VW_AMPHIBIANS AS
SELECT *
FROM VW_REPTILES;
If the same commit has both these views, schemachange will run VW_AMPHIBIANS first, as it runs repeatable files alphabetically. This results in failure, as VW_REPTILES does not exist. To resolve this issue, you will have to run both views in different commits.
Historical Data Load
A situation might arise where you already have database objects and want to backfill your repository with the existing objects. This can lead to issues as Snowflake schema change will try to execute all the new migration files in the repository without monitoring the objects present in the database.
To resolve this problem, you can update the change history table in Snowflake to reflect that the changes have already been applied. For versioned files, identify the current version and insert the higher version record, and for the repeatable files, identify the hash value and insert records with correct values.
Perform Efficient Data Integration with Hevo
The schema change Snowflake feature efficiently tracks historical changes inside the Snowflake environment. The implementation of these changes is a necessary step to consider, especially when you integrate data from your in-house database into Snowflake.
It allows you to understand how data has evolved over time. However, performing data updates in real-time can be a daunting task. You can use SaaS-based platforms like Hevo to automate this task for you.
Hevo is a no-code, real-time ELT data integration platform that cost-effectively simplifies data pipeline automation. It automatically detects the incoming data format, replicating the data into a format compatible with a data warehousing platform like Snowflake.
Hevo provides 150+ data source connector options, enabling efficient data integration between different platforms. Here are some key features of Hevo:
- Data transformation: It provides Python-based and drag-and-drop data transformation techniques that can enable you to clean and prepare your data for further analysis.
- Incremental Data Load: Hevo allows modified data transfer, which enables the database changes to be reflected in near real-time. This feature ensures effective bandwidth utilization on both the source and the destination.
Conclusion
This article highlights the use of a prominent schema migration tool: schema change Snowflake. You can use this tool to automate the database object deployment process and track the change history table.
Curious about integrating data from some databases into Snowflake? Here are some interesting picks for you:
Frequently Asked Questions (FAQs)
Q1. How to perform schema change detection on Snowpipe?
- With CSV input, you can detect newly added fields. The file must have header rows, and the incoming data must match the landing table schema. This process makes it feasible to read headers, parse out a schema, compare it with the current information schema, and use dynamic SQL commands to insert, copy, and insert. Another way is to use schemachange tool to manage history changes.
Q2. How to change ownership of schema in Snowflake?
- You can utilize the GRANT OWNERSHIP command to change the ownership of a database. Follow the commands below.
GRANT OWNERSHIP ON DATABASE mydb TO ROLE developer COPY CURRENT GRANTS;
To grant ownership of all the schemas:
GRANT OWNERSHIP ON ALL SCHEMAS IN DATABASE mydb TO ROLE developer COPY CURRENT GRANTS;
To perform Snowflake change ownership of all tables in schema:
GRANT OWNERSHIP ON ALL TABLES IN DATABASE mydb TO ROLE developer COPY CURRENT GRANTS;
Additionally, you can grant ownership of all the views in your database. Follow the command below.
GRANT OWNERSHIP ON ALL VIEWS IN DATABASE mydb TO ROLE developer COPY CURRENT GRANTS;
Q3. How to perform Snowflake change managed access schema?
- To perform Snowflake change managed access schema, grant managed access permissions to the schema owner and follow the command below.
ALTER SCHEMA "HT_TRAINING_DB"."TEST" ENABLE MANAGED ACCESS;
Dimple is an experienced Customer Experience Engineer with four years of industry proficiency, including the last two years at Hevo, where she has significantly refined customer experiences within the innovative data integration platform. She is skilled in computer science, databases, Java, and management. Dimple holds a B.Tech in Computer Science and excels in delivering exceptional consulting services. Her contributions have greatly enhanced customer satisfaction and operational efficiency.