Back to upcoming training
Apache Airflow - Basics to advanced
From Saturday, 24 Jan 09:00 to Sunday, 25 Jan 23:00
Timezone: GMT +2 (long duration means this is appropriate for multiple timezones - try attend for 6-8 hours per day)
Location: remote
Level: Beginner to Advanced
The instructor will be on duty from 09:00 to 23:00 (GMT+2) each day. You can arrive and leave whenever you need to, but it is recommended that you are present for at least 6 to 8 hours per day.
Payment Information
When you choose to buy a ticket you will be redirected to Quicket. You will see the ticket
prices displayed in South African Rands .
If you would prefer to pay for your ticket in a different way, please get in touch
$1 is about R17.40
Credit cards will work in the usual way.
Details
Apache Airflow was originally created by the nice folks at Airbnb. Airbnb was growing rapidly and, as they grew, so did their task and data pipeline orchestration needs. They created Airflow to solve their own urgent needs. And then they open-sourced it.
Airflow is a kind of task scheduler, but with a lot of super powers. Here are a few of things it's good at:
- Complicated Task Interdependencies: You can build complicated task structures to handle many different needs. In Airflow these are called DAGs - Directed Acyclic Graphs. You can do some pretty hardcore things with these
- Logging and Monitoring: You can see what tasks ran and when, and exactly what happened
- Retries: You can set tasks up so that they retry themselves, and you can rerun tasks and entire workflows whenever you want to
- Secret Management: Automated tasks often need credentials, and those need to be kept safe
- Scale: Airflow allows you to create as many workers as you need, and those workers can be spread across many computers/vms/pods/whatever. There isn't an upper bound
- Usability: It has a really nice UI that you can interact with to view and control things
- Extendable: Airflow is designed for flexibility
One of the really cool things about Airflow is that DAGs are authored using Python code. A DAG is a graph of tasks and their interdependencies. Since DAGs are written with normal Python code (instead of some kind of configuration language), you can be quite creative about how you author them.
What We Will Cover
This workshop will take you from the basics to advanced concepts, helping you to get to grips with some of Airflow's weirder parts for building, scheduling, and managing workflows at any scale. We'll show you how to use Airflow to create reliable, maintainable data pipelines, automate tasks, and troubleshoot complex workflows effectively.
Here is some of what we'll be covering:
- Introduction to Workflow Orchestration and Apache Airflow: Explore the role of Airflow in data engineering and why it’s a popular choice for building and automating complex workflows.
- Installing and Configuring Airflow: Get started with Airflow by setting up a development environment and learning the essentials of configuring the platform.
- Working with DAGs (Directed Acyclic Graphs): Learn how to define workflows using DAGs and explore their structure to create clear, maintainable workflows.
- Creating and Scheduling Tasks: Discover the basics of creating tasks with Python operators, dependencies, and setting up scheduling rules.
- Managing Data Pipelines: Build modular and reusable pipelines by breaking workflows into smaller tasks and organizing code efficiently. Exploring the importance of Idempotent tasks
- Error Handling and Retries: Learn to manage failures and set up retry logic to make workflows more resilient and fault-tolerant.
- Working with Airflow Operators: Explore the range of built-in operators and use custom operators to tailor workflows to your needs.
- Managing Dependencies and Task Concurrency: Set up task dependencies and manage parallel execution to optimize workflow performance.
- Advanced Scheduling with CRON and Timetables: Deepen your experience with scheduling and configure custom schedules to fit specific data needs.
- Approximate stream processing through micro-batching: Airflow isn't designed for stream processing, it's a batch-processing tool. But if you make the batches small enough then you can work with data in a way that is close to real-time
- Managing Secrets and Credentials: Securely handle credentials and API keys using Airflow’s connection management and environment variables.
- Workflow Monitoring and Logging: Track and troubleshoot workflows with Airflow’s logging and monitoring tools, helping you spot issues before they become problems.
- The TaskFlow API and XCom: Simple task dependency inference and data passing
- Parallel task creation using task groups: Create tasks and their dependencies in bulk using the task_group decorator
- Event-driven scheduling: Triggering actions based on external events
- Assets: Updating data assets, and asset-based scheduling
- Common integrations: How to get Airflow to talk to your Django Database 3 ways: Through the ORM, a web API or straight SQL. Note that these techniques can be used for any Python ORM and even other types of databases
How the Workshop Will Work
You will be introduced to concepts in a hands-on way. Every concept will be practised and implemented.
You will also be given DAG challenges to solve along the way to build up and solidify your skills.
Prerequisite Knowledge
Participants need to be comfortable writing Python code.
Prerequisite Software
Please be aware that Airflow only works on certain operating systems. Here is a link to the official docs:
Installation Prerequisites
About the instructor
Sheena O'Connell
Sheena's early career saw her working as a software engineer and technical leader across multiple startups. But it was her passion for education that led her to devote the last 5+ years to reimagining how we teach people to code professionally.
Over the last half decade she had the opportunity to work in the NGO space and build alternative education systems from the ground up. Along the way she have learned a lot about how to teach well, how to build systems that teach well, how to set teachers up for success, and how traditional education systems fall short.
"I've always had a passion for education and had the opportunity to work directly in tech education for the last half decade. The way I think of my work is: I take the science of learning and turn it into the engineering of learning."
Sheena's technical skills are fairly wide ranging, but she has a strong focus on all things Python and web development.
She is also a recognised international speaker, she primarily focuses on spreading tech education best practices around the world.
Want to know more about Sheena? Here are some links: