Apache Airflow is one of the most trusted platforms for orchestrating workflows and is widely used and recommended by top data engineers. This tool provides many features like a proper visualization of the data pipelines and workflows, the status of the workflows, the data logs, and codes as well in quite a detail. The process to Install Airflow is a difficult one but it provides the following benefits.
Airflow provides a rich user interface, which makes the task of visualizing pipelines and monitoring them a breeze. This interface also allows us to see the status of the workflows and troubleshoot the issues when required. It has many readily available connections that allow connecting with multiple sources.
Airflow also provides an option to send alerts when the transfer is successful or even when it fails. Airflow is a distributed system that is highly scalable and can be connected to various sources, making it flexible. These features allow it to be used efficiently in the orchestration of complex workflow and data pipelining problems.
This article provides a step-by-step guide to Install Airflow on your system.
What is Apache Airflow?
Apache Airflow is a workflow engine that helps in scheduling and running data pipelines that are complex. Airflow makes sure that all the steps of the data pipeline get executed in the predefined order and all the tasks get the resources based on the requirement.
As you learn about Airflow, it’s important to know about the best platforms for data integration as well. Hevo Data, a No-code Data Pipeline platform, helps to replicate data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services, and simplifies the ETL process. It supports 150+ data sources (including 40+ free data sources) like Asana and is an easy 3-step process. With Hevo’s transformation feature, you can modify the data and make it into analysis-ready form.
SIGN UP HERE FOR A 14-DAY FREE TRIAL
How to Install Airflow?
It is a difficult task to install airflow, but the benefits it provides sometimes overweigh the difficulties faced by data professionals like you since it is mostly a one-time process.
Method 1: Installing Airflow in Linux
The process to Install airflow is 4 steps.
Let us understand the steps in detail, to install airflow.
1) Installing Ubuntu:
Ubuntu is a Linux operating system. It provides a more controlled environment over all the functionalities of the system.
- Before installing Ubuntu, turn on the developer option on the windows system.
- Enable the subsystem for the Linux option located in Windows Features.
- Download the visual C++ module.
- Install Ubuntu from Microsoft Store or use an ISO file. start the installation process
- A terminal will open where you need to enter your username and password. The terminal doesn’t show the password that you type in.
Ubuntu Installing, this may take a few minutes... Please create a default UNIX user account. The username does not need to match your Windows username. For more information visit: https://aka.ms/wslusers Enter new UNIX username: bull87
- Bash command can be used if you closed the terminal after the last step and reopen it. Bash command helps to communicate with the computer.
C:Usersjacks>bash To run a command as administrator (user "root"), use "sudo <command>". See "man sudo root" for details.
bull87@DESKTOP-G50VTBF:/mnt/c/Users/jacks
2) Installing PIP
Pip is a tool that manages and is designed to install the packages that are written for python and written in python. Pip is required to download Apache Airflow. Run through the following code commands to implement this step:
sudo apt-get install software-properties-common
sudo apt-add-repository universe
sudo apt-get update
sudo apt-get install python-setuptools
sudo apt install python3-pip
sudo -H pip3python install --upgrade pip
3) Install Airflow Dependencies
For airflow to work properly you need to install all its dependencies. Without dependencies Airflow cannot function to its potential i.e, there would be a lot of missing features and may even give bugs. To avoid it run the following commands and install all dependencies.
sudo apt-get install libmysqlclient-dev
sudo apt-get install libssl-dev
sudo apt-get install libkrb5-dev
Airflow uses SQLite as its default database
4) Install Airflow
Run the following command to finally install airflow on your system.
export AIRFLOW_HOME=~/airflowpip3 install apache-airflowpip3 install typing_extensions# initialize the database
airflow initdb
# start the web server, default port is 8080
airflow webserver -p 8080# start the scheduler. I recommend opening up a separate terminal #window for this step
airflow scheduler
# visit localhost:8080 in the browser and enable the example dag in the home page
After you execute the following commands, the process to install airflow on your system is complete. This allows you to access and utilize the complete potential of the airflow tool.
Method 2: Installing Airflow for Windows PC
Here, we are going to install airflow for Windows PC. So there are certain prerequisites for it, such as:
- Docker Desktop
- Visual Studio
- Save the .YAML file in a separate folder. It is needed to start the Apache airflow.
Image Source
- Create a .env file to define configuration variables. For this, open Visual Studio code and open your .yaml file containing the folder in this. After this, create a new file with the extension .env. Enter Airflow_Image_name and Airflow_UID into it and save it.
Image Source
- Pick up the docker-compose file. To do so, go to the new terminal and pass the argument.
Docker-compose up-d
- You can execute it, and it starts pulling all the files and Airflow services.
Image Source
- This shows that Apache airflow has been successfully installed
Image Source
- Go to your local host 8080, to access the login page of airflow. To create an admin username and password, copy the following command and paste in Visual Studio
Image Source
- Now enter the credentials to log in to Apache airflow
Image Source
Thus, you have successfully set up Apache airflow in windows PC.
There are other options to install Airflow, such as:
- Using Released sources:
- This is suitable if you want to build your software from sources and want to verify the integrity of the software.
- You will have to build, install, set, and handle all components of airflow on your own.
- For more details, click here
- Using PyPi
- It is useful when you want to install airflow on the physical or virtual machine, and you are not familiar with docker and containers
- This is only supported by pip through the constraint mechanism.
- You can use this method if you are familiar with Python programming and running custom deployment mechanism software. For details, click here
- Using Production Docker images
- This is useful if you are familiar with the docker stack and know how to build container images
- If you understand how to install providers and dependencies from PyPI
- If you know how to create docker deployments and link multiple docker containers together. For details, click here
- Using Official Airflow Helm Chart
- This is helpful if you know how to manage infrastructure using Kubernetes and applications on Kubernetes using Helm Charts. For details, click here
- Using Managed Airflow Services
- This can be used when you want someone else to manage your airflow account and are ready to pay for it. For details, click here
- Using 3rd party images, charts, deployments
- You can use this if you have tried other ways of installations and found them insufficient. For details, click here
Apache Airflow Platform Workflow
Apache Airflow is a platform that is used for proper monitoring, scheduling, and executing complex workflows. It is an open-source platform that is useful in creating the architecture of workflows. Airflow is one of the most powerful open source data pipeline platforms currently in the market.
Airflow uses DAG ( Directed Acyclic Graphs) to structure and represent the workflows where nodes of DAG represent the tasks. The ideology behind Airflow’s design is that all the data pipelines can be expressed as a code and it soon became a platform where the workflows can iterate quickly and utilize the code-first platforms.
Features of Apache Airflow
- Ease of use: Deploying airflow is easy as it requires just a little bit of knowledge in python.
- Open Source: It is free to use, open-source platform that results in a lot of active users.
- Good Integrations: It has readily available integrations that allow working with platforms like Google Cloud, Amazon AWS, and many more.
- Standard Python for coding: Relatively little knowledge of python can help in creating complex workflows
- User Interface: Airflow’s UI helps in monitoring and managing the workflows. It also provides a view of the status of tasks.
- Dynamic: All the tasks of python can be performed in airflow since it is based on python itself.
- Highly Scalable: Airflow allows the execution of thousands of different tasks per day.
Conclusion
Airflow is one of the most powerful workflow management tools in the market. By utilizing its potential companies are able to solve their problems and complete tasks on time with efficiency. This article gave a comprehensive guide on Airflow and then a step-by-step guide to installing airflow in an easy manner.
Airflow is a trusted source that a lot of companies use as it is an open-source platform. But creating pipelines, installing them on the system, monitoring pipelines, all these are very difficult on Airflow as it is a completely coding platform and it would require a lot of expertise to run properly. This issue can be solved by a platform that creates data pipelines with any code.
Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.
visit our website to explore hevo[/hevoButton]
SIGN UP for a 14-day free trial and see the difference!
Share your experience of learning about the steps to Install Airflow in the comments section below.
Arsalan is a research analyst at Hevo and a data science enthusiast with over two years of experience in the field. He completed his B.tech in computer science with a specialization in Artificial Intelligence and finds joy in sharing the knowledge acquired with data practitioners. His interest in data analysis and architecture drives him to write nearly a hundred articles on various topics related to the data industry.