Understanding Azure Data Factory Schedules Simplified 101

on Azure Data Factory, ETL, ETL Tools, Microsoft, Microsoft Azure, Schedule Triggers, Tutorials • April 19th, 2022 • Write for Hevo

Azure Data Factory Schedule

Companies store and analyze their business data to make smarter decisions. They have their data scattered in a multitude of sources that need to be unified and analyzed to generate insights. Data Pipelines are required to load data from a data source to Data Warehouse. 

Azure Data Factory is an ETL tool by Microsoft that helps users to create and manage Data Pipelines and perform ETL processes. Users can automate their Data Pipeline jobs by using Azure Data Factory Schedules. With the help of Azure Data Factory Schedule Triggers, you can create a schedule trigger to schedule a pipeline to run periodically.

In this article, you will learn about Azure Data Factory Schedules and their components. You will also go through the process of creating Azure Data Factory Schedule Triggers, Functions, and Executions.

Table of Contents

What is Azure Data Factory?

Azure Data Factory Logo
Image Source

Azure Data Factory (ADF) is a Microsoft Azure data pipeline orchestrator and ETL tool. ADF can take data from external data sources (FTP, Amazon S3, Oracle, and a variety of other sources), transform it, filter it, enrich it, and load it to a new location.

Data is loaded and transformed between different data repositories and computational resources using Azure Data Factory. Data-driven processes (also known as pipelines) can be created and scheduled to ingest data from a variety of sources.

Data Factory’s objective is to retrieve information from one or more data sources and transform it into a format that can be processed. The data sources may deliver data in a variety of ways and may have noise that must be removed. You can change the interesting data if it isn’t in a format that the other services in the warehousing solution can process.

Azure Data Factory Control Flow
Image Source

Some terminology will always be utilized in any tip or instructive content on ADF. Here’s a basic rundown:

  • Linked Service: This is the notion of a source connection. For example, this could be a SQL Server connection string or the URL of a SharePoint site.
  • Dataset: While a linked service specifies where data may be found, the dataset specifies how that data should appear. A dataset, in SQL Server’s case, will describe a table and its columns. The dataset will describe the columns of the CSV file, the encoding if a header is used, the delimiter, and so on in the context of a CSV file in Azure Blob Storage.
  • Integration Runtime: This is the computing environment. ADF is a cloud service that requires some computing power to transport your data. The integration runtime takes care of this (IR). There are a few different runtimes to choose from. You can use the Azure-IR to execute your computation in a scalable and elastic way, but you can also use the self-hosted IR to perform the computation on one of your personal servers. The Azure-SSIS IR, which is used to execute Integration Services packages in ADF, is the final component.
  • Pipeline: ADF’s key component is the pipeline. A pipeline is a container for one or more actions. Duplicate activity, executing a stored procedure, executing PowerShell code, executing Python code, copying data from a sink to a destination, and conducting a large data job are all examples of activities that can be performed. Activities can be associated with success, failure, or completion as dependencies. Pipelines can then be scheduled to run or triggered by certain events, such as the formation of a blob in a blob container..
  • Data Flows: A unique pipeline activity with its own set of editors. Data can be read in, transformed in memory, and then written to a destination using a data flow. A (mapping) data flow and a Power Query data flow are the two sorts of data flows. Data flows are more appropriate for big data scenarios because they leverage Azure Databricks behind the scenes.

Key Features of Azure Data Factory

Some of the main features of Azure Data Factory are listed below:

  • Scalability: ADF can handle huge volumes of data with ease due o its in-built parallelism and time-slicing features that help in moving gigabytes of data into the Cloud in a matter of few hours.
  • Minimal Coding Required: Azure Data Factory configuration is based on JSON files and users can also create components from the Azure Portal without doing much coding.
  • Security: ADF automatically encrypts data-in-transit between on-premises and cloud sources.

To learn more about Azure Data Factory, click here.

Load Data Seamlessly Using Hevo’s No Code Data Pipeline

Hevo Data, an Automated No Code Data Pipeline, can help you automate, simplify & enrich your data replication process in a few clicks. With Hevo’s wide variety of connectors and blazing-fast Data Pipelines, you can extract & load data from 100+ Data Sources straight into your Data Warehouse or any Databases. To further streamline and prepare your data for analysis, you can process and enrich raw granular data using Hevo’s robust & built-in Transformation Layer without writing a single line of code! 

Get Started with Hevo for Free

Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial today!

Creating Azure Data Factory Schedule Triggers

This Azure Data Factory Trigger is the most widely used trigger that can schedule the execution of a Data Pipeline. You can choose the start and finish time for the Azure Data Factory Schedule Trigger to be active, and it will only run a Pipeline within that time period.

It gives you more options by allowing you to schedule in minute(s), hour(s), day(s), week(s), or month intervals (s). The Azure Data Factory pipelines are executed on a wall-clock schedule using the Azure Data Factory Schedule trigger.

When the pipeline would be executed, how frequently it will be executed, and optionally the end date for that pipeline, you must indicate the reference time zone that would be used in the Azure Data Factory Schedule trigger start and end dates.

You can also set the Azure Data Factory Schedule Trigger to run on specific dates and times in the future, such as the 30th of each month, the first and third Mondays of each month, and so on.

Azure Scheduler allows you to run jobs on any schedule, such as accessing HTTP/S endpoints or pushing messages to Azure Storage queues, making it suitable for recurring chores such as log cleaning, backups, and other maintenance tasks. Integrate tasks into your apps that execute immediately or at any time in the future, and call services either in or out of Azure.

  1. Navigate to the Data Factory Edit tab or the Azure Synapse Integrate tab.
Switch to Edit tab
Image Source
  1. From the menu, choose Trigger, then New/Edit.
Add New Trigger
Image Source
  1. On the Add Triggers page, go to Choose Trigger and then +New.
Add triggers - new trigger
Image Source
  1. Complete the following steps on the New Trigger page:
  • Ensure that the Azure Data Factory Schedule is selected as the Type.
  • Enter the trigger’s start date and time in the Start Date field. By default, it is set to the latest DateTime in Coordinated Universal Time (UTC).
  • Select the time zone in which the trigger will be created. Please keep in mind that the Trigger’s scheduled execution time will be considered after the Start Date (make sure the Start Date is at least 1 minute prior to the Execution time).

If the pattern is set to Days or higher and the time zone observes daylight saving, the Azure Data Factory Schedule trigger time will automatically adjust for the twice-yearly change.

To avoid the daylight saving time change, choose a time zone that does not watch daylight saving, such as UTC. Set the trigger to Recurrence. Choose one of the values from the drop-down menu. In the text box, enter the multiplier.

For example, if you want the trigger to run once every 15 minutes, choose Every Minute and type 15 into the text box. If you select “Day(s), Week(s), or Month(s)” from the Recurrence drop-down, you will see “Advanced recurrence options.”

Advanced recurrence options of Day(s), Week(s) or Month(s)
Image Source

To specify an end date and time in the Azure Data Factory Schedule trigger, select the “Specify an End Date” option, then Ends On, then OK. Each pipeline run has an associated cost.

If you’re testing, you might want to limit the number of times the pipeline is triggered. But even so, make sure that there is enough time between both the publish time and the end time for the pipeline to run.

The Azure Data Factory Schedule trigger is activated just after you publicly release the solution, not after you save the trigger in the UI.

Trigger settings
Image Source
Trigger Settings for End Date
Image Source
  1. Check the box next to Activated in the New Trigger window, then click OK. This checkbox allows you to disable the trigger later.
Trigger settings - Next button
Image Source
  1. Examine the warning notification in the New Trigger window before clicking OK.
Trigger settings - Finish button
Image Source
  1. Select publish all to publish the changes. Until you publicly release the changes, the trigger does not start triggering pipeline runs.
Publish button
Image Source
  1. Navigate to the Pipeline runs tab on the left, then click Refresh to update the list. The pipeline runs that were triggered by the scheduled trigger will be displayed. Take a look at the Triggered By column values. When you select Trigger Now, the manual trigger run will appear in the list.
  1. Navigate to the Trigger Runs Schedule view.
Monitor trigger runs
Image Source

What Makes Hevo’s ETL Process Unique

Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo Data‘s automated, No-code platform empowers you with everything you need to have for a smooth data replication experience.

Check out what makes Hevo amazing:

  • Fully Managed: Hevo requires no management and maintenance as it is a fully automated platform.
  • Data Transformation: Hevo provides a simple interface to perfect, modify, and enrich the data you want to transfer.
  • Faster Insight Generation: Hevo offers near real-time data replication so you have access to real-time insight generation and faster decision making. 
  • Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (with 40+ free sources) that can help you scale your data infrastructure as required.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-day free trial!

Creating Azure Data Factory Schedule Function

The time-sensitive Us can schedule the execution of the function using the Azure Function. It means that it activates the function at a predetermined time. It works in the same way that CRON expressions do.

When we create a time-triggered function, we must specify a time in CRON format that will determine when the trigger will execute. In Azure Function, the CRON expression is divided into six parts: second, minute, hour, day, month, and weekday. Each part is separated by a space. Let’s make a timer-controlled Azure Data Factory Schedule function.

  1. Open your function app and go to Functions, then + Add.
Add a function in the Azure portal.
Image Source
  1. Choose the Timer trigger template from the drop-down menu.
Select the timer trigger in the Azure portal
Image Source
  1. Select Create Function after configuring the new Azure Data Factory Schedule trigger with the settings listed in the table below the image.
Screenshot shows the New Function page with the Timer Trigger template selected.
Image Source
SettingSuggested valueDescription
NameDefaultDefines the name of your timer triggered function.
Schedule0 */1 * * * *A six field CRON expression that schedules your function to run every minute.

Testing the Function

1. Expand the logs in your function by choosing Code + Test.

Test the timer trigger in the Azure portal
Image Source

2.  Examine the data written to the logs to ensure proper execution.

View the timer trigger in the Azure portal
Image Source

Now, instead of running every minute, you adjust the function’s scheduled to run once every hour.

Azure Data Factory Pipeline Execution

A scheduling trigger causes pipelines to run on a 24-hour basis. This  Azure Data Factory Schedule trigger can be configured to work with both periodic and advanced calendars. For example, the Azure Data Factory Schedule trigger can be set to “weekly” or “Monday at 6 p.m. and Thursday at 6:00 p.m.”

The scheduling trigger is adaptable because the dataset pattern is agnostic and the trigger does not distinguish between time-series and non-time-series data. When you create an Azure Data Factory Schedule trigger, you use a JSON definition to specify the scheduling and recurrence.

Include a pipeline reference of the specific pipeline in the trigger definition to have your  Azure Data Factory Schedule schedule trigger start a pipeline run. There is a many-to-many relationship between pipelines and triggers. A single pipeline can be started by multiple triggers. Multiple pipelines can be started by a single trigger.

JSONCopy
{
  "properties": {
    "type": "ScheduleTrigger",
    "typeProperties": {
      "recurrence": {
        "frequency": <<Minute, Hour, Day, Week, Year>>,
        "interval": <<int>>, // How often to fire
        "startTime": <<datetime>>,
        "endTime": <<datetime>>,
        "timeZone": "UTC",
        "schedule": { // Optional (advanced scheduling specifics)
          "hours": [<<0-24>>],
          "weekDays": [<<Monday-Sunday>>],
          "minutes": [<<0-60>>],
          "monthDays": [<<1-31>>],
          "monthlyOccurrences": [
            {
              "day": <<Monday-Sunday>>,
              "occurrence": <<1-5>>
            }
          ]
        }
      }
    },
  "pipelines": [
    {
      "pipelineReference": {
        "type": "PipelineReference",
        "referenceName": "<Name of your pipeline>"
      },
      "parameters": {
        "<parameter 1 Name>": {
          "type": "Expression",
          "value": "<parameter 1 Value>"
        },
        "<parameter 2 Name>": "<parameter 2 Value>"
      }
    }
  ]}
}

Conclusion

In this article, you learnt about Azure Data Factory Schedules. Specifically, how-to time triggers, functions, and executions. There are currently three types of triggers supported by the service: Schedule trigger: A timer-based trigger that initiates a pipeline. Tumbling window trigger: A trigger that operates on a regular basis while retaining state and A trigger that is triggered in response to an event is referred to as an event-based trigger.

The most popular trigger for scheduling the execution of a Data Pipeline is the Azure Data Factory Schedule Trigger. You can specify a start and end time for the  Azure Data Factory Schedule Trigger to be active, and it will only run a Pipeline during that time period.

Visit our Website to Explore Hevo

Hevo Data provides an Automated No-code Data Pipeline that empowers you to overcome the above-mentioned limitations. Hevo caters to 100+ data sources (including 40+ free sources) and can seamlessly load data to Data Warehouse in real-time. Furthermore, Hevo’s fault-tolerant architecture ensures a consistent and secure transfer of your data to destination. 

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of learning about Azure Data Factory Schedules in the comments section below!

No-code Data Pipeline For your Data Warehouse