Snowflake Tasks: A Comprehensive Guide

on Data Warehouses, Snowflake • January 5th, 2022

Snowflake Tasks

To manage and perform data tasks efficiently nowadays organizations are shifting to Cloud Data Warehouse. Snowflake is one such Data Warehouse solution that can be built on AWS, Azure, or GCP. If you have just completed a mass migration to Snowflake from your On-premises Data Warehouse you might need to know about Snowflake Tasks.

Now that the Schema and Data Components of your migration are complete, the processes that functioned in your previous environment are the final migration type. These workflows primarily consist of Batch Processing Loads from several Data Sources on a set timetable. You will go in deep about how to put these workflows into the new Snowflake environment.

In this article, you will get to know everything about Snowflake Tasks. A Task in Snowflake, like any other database or operating system, is more of a scheduler. Database scheduler, CRON task, and more similar examples come to mind. You can schedule a single SQL statement or a Stored Procedure in Snowflake.

Table of Contents

What is Snowflake?

Snowflake Tasks: logo
Image Source

Snowflake is the world’s first Cloud Data Warehouse solution, built on the infrastructure of a Cloud provider (AWS, Azure, or GCP) of the customer’s choice. Snowflake SQL is consistent with the ANSI standard and includes typical Analytics and windowing capabilities. You’ll notice certain differences in Snowflake’s syntax, but you’ll also see some commonalities. 

Snowflake’s integrated development environment (IDE) is totally Web-based. To interact with your instance, go to XXXXXXXX.us-east-1.snowflakecomputing.com. After logging in, you’ll be sent to the primary Online GUI, which serves as the main IDE, where you can begin interacting with your data assets.

Additionally, each query tab in the Snowflake interface is referred to as a “Worksheet” for convenience. These “Worksheets” are saved automatically and can be accessed at any time, similar to the tab history feature.

Key Features of Snowflake

  • Caching: Snowflake uses a caching mechanism such that when the same query is issued again, the results are rapidly returned from the cache. To avoid re-generation of the report when nothing has changed, Snowflake employs persistent (within the session) query results.
  • Query Optimization: Snowflake can optimize a query on its own by Clustering and Partitioning. There’s no need to be concerned about Query Optimization.
  • Secure Data Sharing: Using Snowflake Database Tables, Views, and UDFs, data can be shared from one account to another.
  • Support to File Formats: Snowflake supports the import of semi-structured data such as JSON, Avro, ORC, Parquet, and XML. It has a column type called VARIANT that allows you to store semi-structured data.
  • Standard and Extended SQL Support: Snowflake provides good ANSI SQL support, as well as advanced SQL features including Merge, Lateral View, Statistical Functions, and many others.
  • Fault-Tolerant: In the event of a failure, Snowflake delivers extraordinary fault-tolerant capabilities to recover the Snowflake object (tables, views, database, schema, and so on).

To get further information check out the official website here

What is Snowflake Tasks?

In simple terms, Snowflake Tasks are schedulers that can assist you in scheduling a single SQL Query or Stored Procedure. When paired with streams to create an end-to-end Data Pipeline, a job can be quite beneficial.

CRON and NON-CRON variant scheduling mechanisms are available in the Snowflake Tasks Engine. If you’re a Linux user, you’ll recognize the CRON variant’s syntax.

At any one time, Snowflake ensures that only one instance of a job with a schedule (i.e. a solo task or the root task in a tree of tasks) is executed. When the next scheduled execution time comes around, if a task is still running, the next scheduled execution time is skipped.

The job is always generated in suspended mode when you create it. As a result, you’ll need to manually resume the task in order to get things moving again. To resume the task, type the following command.

ALTER TASK <TASK_NAME> RESUME;

To avoid Unexpected Task Executions owing to daylight saving time, either: 

  • Do not schedule tasks to run between 1 AM and 3 AM (daily, or on days of the week that include Sundays), or
  • To accommodate for the time shift due to daylight saving time, manually update the Cron Expression for jobs scheduled during certain hours twice a year.

Simplify Snowflake Data Transfer with Hevo’s No-code Pipeline

Hevo Data is a No-code Data Pipeline that helps you transfer data from 100+ sources (including 40+ Free Data Sources) to Snowflake in real-time in an effortless manner. After using Hevo you can easily carry out Snowflake Tasks.

Get Started with Hevo for Free

Key Features of Hevo Data:

  • Fully Managed: Hevo Data is a fully managed service and is straightforward to set up.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, databases, analytics, and BI tools. It supports various destinations including Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL databases to name a few.  
  • Ensure Unique Records: Hevo Data helps you ensure that only unique records are present in the tables if Primary Keys are defined.
  • Multiple Sources: Hevo Data has various connectors incorporated with it, which can connect to multiple sources with ease. 
  • Automatic Mapping: Hevo Data automatically maps the source schema to perform analysis without worrying about the changes in the schema.
  • Real-time Data Transfer: Hevo Data works on both batch as well as real-time data transfer. 
  • Resume from Point of Failure: Hevo Data can resume the ingestion from the point of failure if it occurs. 
  • Advanced Monitoring: Advanced monitoring gives you a one-stop view to watch all the activity that occurs within pipelines.
  • 24/7 Support: With 24/7 Support, Hevo provides customer-centric solutions to the business use case.

Steps to load Snowflake data using Hevo Data:

  • Sign up on the Hevo Data, and select Snowflake as the destination.
  • Provide the user credentials and connect to the server.
  • Select the database, and schema to load the data.
Sign up here for a 14-Day Free Trial!

DDL Operations for Snowflake Tasks 

DDL operations based on SQL are performed to process the Snowflake Tasks.

How to Create a Task?

You can replace an existing task or create a new task in the current/specified schema. The following variation is also supported by this command (creates a clone of an existing task):

CREATE TASK … CLONE 

Syntax:

CREATE [ OR REPLACE ] TASK [ IF NOT EXISTS ] <name>
  [ { WAREHOUSE = <string> } | { USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = <string> } ]
  [ SCHEDULE = '{ <num> MINUTE | USING CRON <expr> <time_zone> }' ]
  [ ALLOW_OVERLAPPING_EXECUTION = TRUE | FALSE ]
  [ <session_parameter> = <value> [ , <session_parameter> = <value> ... ] ]
  [ USER_TASK_TIMEOUT_MS = <num> ]
  [ COPY GRANTS ]
  [ COMMENT = '<string_literal>' ]
  [ AFTER <string> ]
[ WHEN <boolean_expr> ]
AS
  <sql>

How to Describe a Task?

The columns of a Task are explained with the DESCRIBE command. Short-form and command for DESCRIBE are written as: 

Syntax: 

DESC[RIBE] TASK <name>

How to Alter a Task?

The Alter clause changes the attributes of a task that already exists. The keyword for this is:

Syntax: 

ALTER TASK [ IF EXISTS ] <name> RESUME | SUSPEND

ALTER TASK [ IF EXISTS ] <name> REMOVE AFTER <string> | ADD AFTER <string>

ALTER TASK [ IF EXISTS ] <name> SET
  [ WAREHOUSE = <string> ]
  [ SCHEDULE = '{ <number> MINUTE | USING CRON <expr> <time_zone> }' ]
  [ ALLOW_OVERLAPPING_EXECUTION = TRUE | FALSE ]
  [ USER_TASK_TIMEOUT_MS = <num> ]
  [ <session_parameter> = <value> [ , <session_parameter> = <value> ... ] ]

ALTER TASK [ IF EXISTS ] <name> UNSET [ <session_parameter> [ , <session_parameter> ... ] ] [ , ... ]

ALTER TASK [ IF EXISTS ] <name> SET TAG <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' ... ]

ALTER TASK [ IF EXISTS ] <name> UNSET TAG <tag_name> [ , <tag_name> ... ]

ALTER TASK [ IF EXISTS ] <name> MODIFY AS <sql>

ALTER TASK [ IF EXISTS ] <name> MODIFY WHEN <boolean_expr>

How to Drop a Task?

Removes a job from the specified/current schema. The keyword for this is:

Syntax: 

DROP TASK [ IF EXISTS ] <name>

How to Schedule a Task?

You can Schedule a Task in Cron Job in a way similar to the sample syntax below which is scheduled to run at 08:05 PM UTC every day:

Snowflake Tasks: Schedule
Image Source

You can see below the meaning of each * if are not aware of Cron Jobs.

# __________ minute (0-59)
# | ________ hour (0-23)
# | | ______ day of month (1-31, or L)
# | | | ____ month (1-12, JAN-DEC)
# | | | | _ day of week (0-6, SUN-SAT, or L)
# | | | | |
# | | | | |
 * * * * *

What is a Tree of Snowflake Tasks? 

You can establish a B-Tree-style Task Structure in Snowflake. Due to Task Dependency, there can only be one Root Task, and all child tasks are related to the root/predecessor tasks (i.e. Before or after). The scheduler for Root Jobs will be defined, and all child tasks will be executed in the order that their dependencies are defined.

Snowflake Tasks: tree
Image Source

The overall number of tasks in a simple task tree is limited to 1000 (including the Root Task). A single precursor task is allowed for each task in the tree; however, a task can have up to 100 Offspring Tasks (i.e. other tasks that identify the task as a predecessor).

In a simple tree of tasks, all tasks must be held by the same task owner, and all objects must be kept in the same Database and Schema.

What is Snowflake Task Overlapping? 

A specific tree of tasks can only run one instance at a time in Snowflake. For example, suppose TASK 1 is set to run every 5 minutes and has two child tasks.

If one of the child’s tasks fails to finish on time and overlaps the following run (after the next 5 minutes), Snowflake ensures that the scheduled next run is skipped at least once.

This is the default behavior, however, it can be modified by setting the following parameter to true during the creation of a Root Job to overlap and start the next run even if the previous Root Task hasn’t finished yet.

This is not advised unless you are certain that your execution has no dependencies or issues with the prior run. During the creation or modification of the Root Task, the following parameter can be set.

To better match a Task Tree with the Root Task’s timetable, do the following:

  • Increase the Scheduling Time between runs of the root task if possible.
  • Consider expanding the Warehouse that runs the Task Tree’s massive or complicated SQL Statements or Stored Procedures.
  • Examine the SQL Statements or Stored Procedures that each task uses. Check to see if the code may be modified to take advantage of Parallel Processing.

Understanding Snowflake Tasks Versioning for Runs

When you run Snowflake Tasks, they produce a copy of itself that contains all of the Properties for all of the Tasks in the tree. Until it is completed, the Root Job operates under the same version of Code and Setup.

If you try to change any of the Child Task Code that hasn’t run yet, it won’t let you unless the Root Task is suspended first.

However, if you attempt to edit the Root Job, it will enable you to do so, but it will then be suspended, canceling all future planned runs of the Root Task. The Root Task must be restarted in order to see the reflection of changes for a Task.

If the definition of a Stored Procedure is invoked by a Task change while the Tree of Tasks is running, the new programming may be executed when the Stored Procedure is invoked by the Task in the current run.

What are the Use Cases/Scenarios for Snowflake Tasks?

How to Run a Task at Pre-determined Frequency?

Every 1 minute, every 30 minutes, and so on, the task can be completed.

Step A: Create a Task and Check

Making a Task that every 1 minute writes the current Timestamp and a constant value into a table: 

Snowflake Tasks: Create
Image Source
Snowflake Tasks: Show
Image Source

The job will always be in “Suspended” mode, meaning it will not start unless you “Resume” it. 

Step B: Resume Task and Check

The timer will start as soon as you continue it, and the Task will appear in your schedule. In this scenario, it will insert a record into the table after one minute and repeat the procedure every minute until you “Suspend” it again.

 -- resume / suspend
ALTER TASK load_user_table RESUME 
Snowflake Tasks: resume
Image Source

You will see, one new row is added to the table every minute. Now, “Suspend” the Task to prevent it from adding any more rows to the table.

 -- resume / suspend
ALTER TASK load_user_table SUSPEND

The work will now be put on hold until you Resume it.

How to Run a Task in Cron Jobs [Pre-determined Time]?

Let’s proceed explanation for this type of Task with an example. Let’s create a Job that runs at 8:50 a.m. UTC every day and only writes the current timestamp and a constant value into a table once each day.

Step A: Create a Task and Check

Snowflake Tasks: create con job
Image Source

See if the Task was successfully created.

Snowflake Tasks: show
Image Source

Step B: Resume the Task and Check

Keep in mind that the Task is always created in Suspend mode

Now, every day at 8:50 UTC, one record will be added.

Snowflake Tasks: con job o/p
Image Source

You have to set the Task to “Start” such that it adds one entry into the Database every day at 8:50 a.m. UTC. You can also use this method to schedule a Stored Operation.

How to Execute Stored Procedures in Tasks?

Here, you need to make a Task that invokes a Stored Procedure.

Step A: Creating Stored Procedures

Inside this Stored Procedure, use the same “Insert into table” line. In actuality, you’ll have more Advanced Procedures. In Snowflake Stored Procedures, you can employ “Javascript” functions.

Snowflake Tasks: stored procedure
Image Source

Step B: Create a Task and Check

Snowflake Tasks: create task
Image Source

See if the Task was successfully created.

Step C: Resume Task and Check

Keep in mind that the Task is always created in Suspend mode

Check in the USER_NAME filed you will have: “from_stored_procedure”.

Snowflake Tasks: resume and check
Image Source

How to Execute a Tree of Snowflake Tasks?

Let’s take the same example as above. Create a basic Task Tree by defining the current “load_user_table_by_cron_job” task as the predecessor task, which, when run successfully, activates the new “after_cron_job” task. The new task updates our “user” table with the current timestamp and a constant AFTER CRON JOB value.

Step A: Create a Task and Check

Snowflake Tasks: create after cron
Image Source

See if the Task was successfully created.

Step B: Resume Task and Check

Keep in mind that the Task is always created in Suspend mode

Snowflake Tasks: o/p
Image Source

Daily one record will be inserted after 8:50 UTC (following the execution of “load user table by cron job“).

Row No. 18 was inserted on 23 May after the task “load_user_table_by_cron_job” was completed, and Row No. 20 was inserted on 24 May after the task “load_user_table_by_cron_job” was completed.

Snowflake Tasks Security

1) Access Control Privileges

A) Creating Tasks

To create tasks, you’ll need a role with at least the following privileges:

Snowflake Tasks: creating tasks objects
Image Source

B) Owning Tasks

The Task Owner (i.e. the role with the OWNERSHIP privilege on the task) must have the following privileges after a Task is created:

Snowflake Tasks: owning tasks objects
Image Source

In addition, the role must have the necessary permissions to execute the task’s SQL Statement.

C) Suspending or Resuming Tasks

A role with the OPERATE privilege for the Task, in addition to the Task Owner, can suspend or restart the task. The USAGE privilege on the Database and Schema containing the job is required for this role. There are no other requirements.

Snowflake validates that the Task Owner role has the privileges indicated in Owning Tasks when a task is resumed (in this topic).

2) Assigning Task Administrator 

You can define a custom role (for example, taskadmin) and assign the EXECUTE TASK privilege to this role for simplicity of use. Any role that has the ability to grant privileges (for example, SECURITYADMIN or any role with the MANAGE GRANTS privilege) can then provide this custom role to any task owner role to allow them to edit their own tasks.

It is only necessary to revoke this custom role from the task owner role to remove the task owner role’s ability to execute the task. If you don’t want to create this custom role, an account administrator must revoke the task owner role’s EXECUTE TASK privilege.

For example, create a custom role called taskadmin and grant it the EXECUTE TASK privilege. Assign the taskadmin role to the myrole Task Owner role:

use role securityadmin;

create role taskadmin;

-- set the active role to ACCOUNTADMIN before granting the account-level privileges to the new role
use role accountadmin;

grant execute task, execute managed task on account to role taskadmin;

-- set the active role to SECURITYADMIN to show that this role can grant a role to another role
use role securityadmin;

grant role taskadmin to role myrole;

3) Dropping Task Admin Role

When a Task’s Owner role (the role having the OWNERSHIP privilege on the task) is removed, the task is “re-possessed” by the role that removed the owner role. This guarantees that ownership is transferred to a role closer to the top of the hierarchy.

When a Task is re-possessed, it is automatically paused, which means that all existing executions are complete, but no further executions are scheduled until the job is explicitly resumed by the new owner.

The goal is to prevent a user who has access to a specific role from leaving behind tasks that execute with higher permissions when the role is deleted.

The task completes processing under the dropped role if the role that a running task is executing under is dropped while the task is running.

Conclusion

This article has exposed you to the various Snowflake Tasks to help you improve your overall decision-making and experience when trying to make the most out of your data. In case you want to export data from a source of your choice into your desired Database/destination like Snowflake, then Hevo Data is the right choice for you! 

Visit our Website to Explore Hevo

Hevo Data provides its users with a simpler platform for integrating data from 100+ sources for Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouses such as Snowflake, Database, or a destination of your choice. You can easily carry out Snowflake Tasks after using Hevo. It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have Analysis-ready data in your desired destination.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!

Share your experience of learning about Snowflake Tasks! Let us know in the comments section below!

Data Engineering
Survey 2022
Calling all data engineers – fill out this short survey to help us build an industry report for our data engineering community.
TAKE THE SURVEY
Amazon Gift Cards of $25 each are on offer for all valid and complete survey submissions.