Monthly Archives: January 2020

Azure Data Factory Error Handing and Failure Management

If there is an error in a process, I want to know about it. In this summary, I went over two different ways of notification (email) and high level pipeline design strategy in Azure Data Factory.

Activity Conditions

Data Factory activity conditions

Data Factory flow control is not try/catch/finally paradigm. However, generally the flow is controlled with the success, error, completion (success or failure), and skipped outputs of an activity

Data Factory activity will be branched and chained together in a pipeline. Dependency condition in an activity (success, failure skipped, completion) determines the control flow of the next activity in the pipeline.

For “completion” condition, a subsequent activity will be executed regardless of success or failure of the precedent activity. When “skipped” dependency is defined, a subsequent activity will be executed only if the precedent activity isn’t executed.

Notification

To send a notification email, there are two different ways in Data Factory.

  1. Activity based email
    Just like other activity, we can create a sending email activity in a pipeline. It requires additional Azure service components: LogicApp and Email Service provider (e.g. SendGrid). Advantage is that we can fully control the behavior of email notification based on a condition of precedent activity. There are many tutorials cover this use cases in the internet. Example link.
  2. Metrics based Alert
    Another way is to use Alert & metrics service in Data Factory by using Azure Monitor. Email will be sent after the fact based on the Azure Data Factory diagnostic logs and base-level infrastructure metrics (below screenshot). Because it is based on the log metrics, we cannot control the behavior of activity directly and it is appropriate for the support/reporting notification.
Available metric to set up the alert condition

Pipeline Design & Flow Control

Consideration

Coming from SSIS package knowledge, concept of Data Factory pipeline should be straight forward. However, there are a couple of differences that need to be considered.

  • SSIS defines expressions to be evaluated to determine if the next task should be executed.
  • SSIS allows multiple constraints logical AND or logical OR.

Main difference in Data Factory is activity dependencies are always a logical AND. This means handling “Failure” activity cannot be shared from two different parent(precedent) activities. For example, Activity A and Activity B are independent. If we want to handle a failure, we have to provide independent failure Activity for A and B separately. Because of logical AND, “Failure” handling activity will not execute unless both parent activities fail.

To handle this situation, several options to consider when designing a pipeline. Detail example can be found in this link.

  1. Multiple failure activities – repeating multiple failure activities to handle each activity failure. Straight forward, but least elegant option.
  2. Handle failure at a parent pipeline – create a parent pipeline and use an execute pipeline activity and add a single failure activity. This works best if we don’t care which activity failed and want to handle at the pipeline level.
  3. Use If Condition activity – set failure condition and link “Complete” condition to its child activity. Based on If Condition, True to meet the failure condition and handle failure, False to meet the success condition and continue with a child activity.

Example

Below pipeline is to call multiple stored procedures. Before ForEach pipeline execution, Lookup activity is to retrieve an array from a transformation table and pass them down to each stored procedure as parameters. It may not make sense to have send failure email activity here because likely you want to handle a failure situation specific to an activity.

Example pipeline to demonstrate parallel run


Details of activities in this pipeline.

  1. Lookup – when this pipeline is triggered (REST API, scheduler, or other even), this activity look up a transformation table and retrieve BatchID and Source. Presented Lookup activity will retrieve only first row, but in order to have it multiple values, we need to use “Set Variable” activity.
  2. ForEach – These variables (BatchID and Source) will pass down to ForEach activity. Inside of ForEach activity, we will have StoredProcedures. For a better understanding, Stored Procedure 1, 2 are presented below the ForEach activity, but it is embedded inside of ForEach activity. By default, ForEach run parallel execution (maximum 50, 20 by default). Stored Procedure 1, 2 are executed at the same time. If there is no dependency, we may consider executing “Execute Pipeline” which is a container for another pipeline that contains multiple stored procedure call in sequence. ForEach can run in sequentially. It would be ideal to run sequentially if we have a dependency.
  3. Web – Send failure email, Send completion email. These Web activities are going to call LogicApp where it knows how to parse values that have been passed down from the Web activity and calls email sender (e.g. SendGrid).

Except ForEach, all activities have retry settings, so in case of delay in precedent activity, it has a mechanism to try again with retry period, number of retry.

Summary

Error handling and failure management in Azure Data Factory requires a design work in addition to understand how control flow works (e.g. ForEach). Monitoring purpose, Azure Data Factory provides email notification out of the box, but activity specific notification with a following activity requires additional Azure services (LogicApp, SendGrid).