Automation reduces manual work and plays a key role in improving productivity in various industries. It is one of the fastest and less time-consuming techniques that most businesses practice to gain higher production rates and improve work efficiency. But, unsure how to perform, some of them fail to automate tasks and end up performing functions manually.

Each IT expert has a different job or workflow to perform, right from collecting data from other sources to processing it, uploading, and creating reports. There are many tasks that experts need to perform manually on a daily basis. Thus, to trigger automatic workflow and reduce the time and effort of experts, we recommend using Apache Airflow.

Apache Airflow is an open-source tool that assists in managing complex workflows. The powerful workflow management platform helps resolve issues and aids in programmatically authoring, scheduling, and monitoring daily tasks. Data Scientists or Data Engineers often find it helpful for their industry. Upon a complete walkthrough of this article, you will gain a decent understanding of Apache Airflow. You will also learn about the steps required to automate the process of sending Emails using Airflow EmailOperator.

Table of Contents

What is Apache Airflow?

Airflow Logo
Image Source

Started in 2014 at Airbnb as a solution, Apache Airflow has now turned into a trusted Open-Source Workflow Management platform. Written in Python, the popular workflow engine simplifies complex data pipelines and automates management tasks. Further, the tool ensures each task is processed and executed in the correct order. For managing the workflow orchestration, Airflow makes use of Directed Acyclic Graphs (DAGs) that run as per a schedule or when an external event triggers.

The platform helps visualize the success status, data pipelines’ dependencies, code, logs, and progress. It also helps in troubleshooting issues whenever needed. Apache Airflow is one of the flexible and scalable workflow platforms designed to manage the orchestration of complex business logic.

Key Features of Apache Airflow

Some of the key features of Apache Airflow are as follows:

  • It is easy to use if you have a fundamental understanding of Python.
  • Apache Airflow is a free, scalable open-source workflow management platform.
  • It can easily integrate with other platforms like Amazon AWS, Microsoft Azure, Google Cloud, etc.
  • It uses python to write code and simplify complex pipelines and workflows.
  • The rich user interface helps in monitoring and managing complex workflows. Further, it helps keep track of the ongoing tasks and status.

What is EmailOperator?

Like the DAGs in airflow are used to define the workflow, operators are the building blocks that decide the actual work. These operators define the work or state the actions that one needs to perform at each step. There are different operators for general tasks, including:

  • PythonOperator
  • MySqlOperator
  • EmailOperator
  • BashOperator

Talking about the Airflow EmailOperator, they perform to deliver email notifications to the stated recipient. It is the direct method for Airflow send emails to the recipient. These can be task-related emails or alerts to notify users. The only disadvantage of using Airflow Email Operator is that this operator is not customizable. Here is the code:

t4= EmailOperator(
       task_id=t4,
       to='test@mail.com',
       subject='Alert Mail',
       html_content=""" Mail Test """,
       dag=dag
)
Simplify your Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up Data Integration from 100+ Data Sources (including 40+ Free sources) and will let you directly load data from different sources to a Data Warehouse or the Destination of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data. 

Get Started with Hevo for Free

Let’s look at some of the salient features of Hevo:

  • Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
  • Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer. 
  • Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
  • Schema Management: Hevo can automatically detect the schema of the incoming data and maps it to the destination schema.
  • Connectors: Hevo supports 100+ Integrations to SaaS platforms FTP/SFTP, Files, Databases, BI tools, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake, Firebolt, Data Warehouses; Amazon S3 Data Lakes; Databricks; and MySQL, SQL Server, TokuDB, MongoDB, PostgreSQL Databases to name a few.  
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
  • Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

How to Send Emails using Airflow EmailOperator

Email Automation feature helps business stakeholders to send logs on time, provide alerts on error files, and share results for Data Analysis. It further helps improve engagement and creates a better experience for the recipient. Also, by automating email, the recipient timely receives a notification about the task specifying if the data pipeline failed or is still running. Overall, the process helps save time and reduces the manual burden of experts. 

For testing the email operator job, one must add a DAG file to run the python function. Once the python function is well-executed, the EmailOperator Airflow will send the email to the recipient. To perform this function properly, one must install Apache Airflow or Ubuntu in the virtual machine. Follow the below-listed steps to send an email from Airflow using the Airflow EmailOperator.

Step 1: Login to the Gmail Account

Change the Google Accounts Settings and allow it to use less secure apps before you begin using Airflow. This step is necessary so that Google allows access to your code. Once the code is live, you can switch back to the changed settings for security reasons.

To change the settings, go to the Google Account => Setting => Less secure app access => Turn it on.  

Python supports the smtplib module that defines the SMTP Client Session Object allowed to send mail to any machine over the Internet. It uses an SMTP or ESMTP listener daemon to forward the alert or message.

Step 2: Enable IMAP for the SMTP

  • Go to the settings using the gear symbol in your Gmail Account.
  • Now, click on the ‘Forwarding and POP/IMAP‘ tab under settings.
  • Lastly, Enable the IMAP radio button from the sub-section “IMAP access“.
Gmail Account Settings
Image Source

Step 3: Update SMTP details in Airflow

In this step, to use the Airflow EmailOperator, you need to update SMTP details in the airflow/ airflow /airflow/airflow.cfg config file.

  • Now using any editor, open the Airflow.cfg file.
  • Add the following configuration in [smtp]
# If you want airflow to send emails on retries, failure, and you want to use 
# the airflow.utils.email.send_email_smtp function, you have to configure an 
# smtp server here 
smtp_host = smtp.gmail.com 
smtp_starttls = True 
smtp_ssl = False 
# Example: smtp_user = airflow 
smtp_user = your gmail id 
# Example: smtp_password = airflow 
# smtp_password = your gmail password 
smtp_port = 25 
smtp_mail_from = give the email, from which email id you want send the mails(your mail id )
  • Use the following command to create a DAG file in /airflow/dags folder:
sudo gedit emailoperator_demo.py
  • Once the DAG file is created, it is time to write a DAG file.

Step 4: Import modules for the Workflow

You now need to import Python dependencies for the workflow. You can refer to the following code:

import airflow 
from datetime import timedelta 
from airflow import DAG 
from datetime import datetime, timedelta 
from airflow.operators.python_operator import PythonOperator 
from airflow.operators.email_operator import EmailOperator

Step 5: Define the Default Arguments

Next up, you can define the default and DAG-specific arguments:

default_args = { 
'owner': 'airflow', 
#'start_date': airflow.utils.dates.days_ago(2), 
# 'end_date': datetime(), 
# 'depends_on_past': False, 
# 'email': ['airflow@example.com'], 
# 'email_on_failure': False, 
#'email_on_retry': False, 
# If a task fails, retry it once after waiting 
# at least 5 minutes 
#'retries': 1, 'retry_delay': timedelta(minutes=5), 
}

Step 6: Instantiate a DAG

In this step, generate a DAG name, set settings, and configure the schedule.

dag_email = DAG( 
dag_id = 'emailoperator_demo', 
default_args=default_args, 
schedule_interval='@once', 
dagrun_timeout=timedelta(minutes=60), 
description='use case of email operator in airflow', 
start_date = airflow.utils.dates.days_ago(1))

Step 7: Setting up Tasks

This step involves setting up workflow tasks. Below are the task codes generated by instantiating.

def start_task(): 
print("task started") 

start_task = PythonOperator( 
task_id='executetask', 
python_callable=start_task, 
dag=dag_email) 

send_email = EmailOperator( 
task_id='send_email', 
to='vamshikrishnapamucv@gmail.com', 
subject='ingestion complete', 
html_content="Date: {{ ds }}", 
dag=dag_email)

Step 8: Set Dependencies

Set dependencies for the tasks that need to be executed. A DAG file only organizes the task. Follow these ways to define dependencies between tasks and create a complete data pipeline:

send_email.set_upstream(start_task) 

if __name__ == "__main__": 
dag_spark.cli()

As per the code, the send email task will execute after localspark_submit.

Step 9: Task Verification

  • Unpause the email_operator_demo dag file as shown in the screenshot.
Task Verification for Airflow EmailOperator
Image Source
  • Select the “email_operator_demo” and look for the dag log file. Now, select Graph View. Here you will be represented two tasks – execute_task python task and send_email task.
Send Email
Image Source
  • Click on the execute task from the graph view to see how the query ran in the log file. As you click, a new window will display on the screen.
Executing Task
Image Source
  • Now, to check all the log files, select the log tab. When you click on the log tab, a list of active tasks will show up on your screen.
List of active tasks
Image Source
  • Here is how the task output will display. For the Send_email task, follow the same steps.
Output
Image Source
  • Here is how the send email task output will display when an email is sent.
Ingestion Complete
Image Source

Conclusion

Apache Airflow uses Directed Acyclic Graphs (DAGs) and operators to perform tasks and send emails to the recipient. It assigns Airflow EmailOperators to timely send task-related emails or alerts to the specified recipient.

This guide briefs you on how to send emails from Airflow using the Airflow EmailOperator. By adapting the email automation feature, your business stakeholders will be able to improve engagement and create a better experience for all the recipients.

As your business begins to grow, data is generated at an exponential rate across all of your company’s SaaS applications, Databases, and other sources. To meet this growing storage and computing needs of data,  you would require to invest a portion of your engineering bandwidth to Integrate data from all sources, Clean & Transform it, and finally load it to a Cloud Data Warehouse for further Business Analytics. All of these challenges can be efficiently handled by a Cloud-Based ETL tool such as Hevo Data.

Visit our Website to Explore Hevo

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations, with a few clicks. Hevo Data with its strong integration with 100+ sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Share with us your experience of learning about Airflow EmailOperator in the comments below!

Hitesh Jethva
Freelance Technical Content Writer, Hevo Data

Hitesh is skilled in freelance writing within the data industry. He creates engaging and informative content on various subjects like data analytics, machine learning, AI, big data, and business intelligence byusing his analytical thinking and problem solving ability.

No-code Data Pipeline for Your Data Warehouse

Get Started with Hevo