Comprehensive Apache Airflow


Course Number: PYTH-224

Duration: 5 days (32.5 hours)

Format: Live, hands-on

Beginning and Advanced Airflow Overview

This Comprehensive Airflow training course teaches software engineers and data engineers the fundamental and advanced Airflow skills they need to successfully orchestrate production-ready data pipelines. Students learn how to create sophisticated DAGs (Directed Acyclic Graphs) and apply security practices to Apache Airflow. In addition, students learn how to scale Airflow within Kubernetes. 

Location and Pricing

Accelebrate offers instructor-led enterprise training for groups of 3 or more online or at your site. Most Accelebrate classes can be flexibly scheduled for your group, including delivery in half-day segments across a week or set of weeks. To receive a customized proposal and price quote for private corporate training on-site or online, please contact us.

Objectives

  • Create production-ready data pipelines in Airflow
  • Build pipelines in Airflow that are able to scale to hundreds of tasks
  • Enforce modularization and reusability of Airflow tasks across projects
  • Scale Airflow in Kubernetes
  • Secure your Apache Airflow installation
  • Create highly concurrent DAGs in Kubernetes
  • Leverage most of the new functionality Airflow 2.x brings

Prerequisites

All attendees should have basic Python knowledge or object-oriented programming experience.  

Outline

Expand All | Collapse All

Introducing Apache Airflow
  • What Airflow is and what does it solve?
  • Airflow architecture
  • How do we represent a Pipeline?
  • Our first DAG
  • Tasks, TaskFlow, and Operators
  • First Pipeline
Mastering scheduling
  • execution_date, start_date and schedule_interval
  • Handling non-default schedule_intervals
  • Playing with time
Abstracting functionality
  • Using custom operators
  • Creating TaskGroups vs subDAGs
  • Sharing data with xCOMs
  • Branching and Triggers
  • Sensors and SmartSensors
Executors and Scaling Airflow
  • Abandoning SQLite for PostgreSQL
  • Executors: Debug, Local, Celery
  • Concurrency and parallelism
  • Concurrency with Celery
  • Airflow in Kubernetes, the old and new ways
  • KEDA and HA scheduler
  • Deploying a highly availability fault-tolerant Airflow
Creating DAGs
  • Secrets, connections, and variables
  • Creating connections on startup
  • Using Pools for long-running and demanding tasks
  • Simulating long-running tasks
  • DAG serialization
  • DAG versioning
  • Testing DAGs
  • CI/CD in Airflow
Modularizing DAGs
  • TaskGroups vs subDAGs
  • TaskFlowAPI and XComs
  • Modularizing
  • Dynamic and Functional DAGs
  • SmartSensors and timeouts
Airflow Security
  • RBAC in Airflow
  • Setting up OAuth authentication
  • Add Google OAuth
  • Adding SSL certs
  • Default Roles and custom roles
  • Creating a custom role
Airflow in Kubernetes
  • The Helm chart
  • Deploying Airflow with Helm
  • Deploying single tasks to Kubernetes: KubernetesPodOperator
  • Adding a task in Kubernetes
  • Scaling Airflow with Kubernetes executor
  • Changing the Helm charts values
  • KEDA autoscaler
  • Preparing DAGs for Kubernetes
  • Creating a DAG fully in Kubernetes
  • The CeleryKubernetes executor for extreme scalability
Upgrading from Airflow 1.10
Conclusion

Training Materials

All Airflow training students receive comprehensive courseware.

Software Requirements

  • Python 3.5 or later
  • Airflow 2.1 or later


Related Topics

Learn faster

Our live, instructor-led lectures are far more effective than pre-recorded classes

Satisfaction guarantee

If your team is not 100% satisfied with your training, we do what's necessary to make it right

Learn online from anywhere

Whether you are at home or in the office, we make learning interactive and engaging

Multiple Payment Options

We accept check, ACH/EFT, major credit cards, and most purchase orders



Recent Training Locations

Alabama

Birmingham

Huntsville

Montgomery

Alaska

Anchorage

Arizona

Phoenix

Tucson

Arkansas

Fayetteville

Little Rock

California

Los Angeles

Oakland

Orange County

Sacramento

San Diego

San Francisco

San Jose

Colorado

Boulder

Colorado Springs

Denver

Connecticut

Hartford

DC

Washington

Florida

Fort Lauderdale

Jacksonville

Miami

Orlando

Tampa

Georgia

Atlanta

Augusta

Savannah

Hawaii

Honolulu

Idaho

Boise

Illinois

Chicago

Indiana

Indianapolis

Iowa

Cedar Rapids

Des Moines

Kansas

Wichita

Kentucky

Lexington

Louisville

Louisiana

New Orleans

Maine

Portland

Maryland

Annapolis

Baltimore

Frederick

Hagerstown

Massachusetts

Boston

Cambridge

Springfield

Michigan

Ann Arbor

Detroit

Grand Rapids

Minnesota

Minneapolis

Saint Paul

Mississippi

Jackson

Missouri

Kansas City

St. Louis

Nebraska

Lincoln

Omaha

Nevada

Las Vegas

Reno

New Jersey

Princeton

New Mexico

Albuquerque

New York

Albany

Buffalo

New York City

White Plains

North Carolina

Charlotte

Durham

Raleigh

Ohio

Akron

Canton

Cincinnati

Cleveland

Columbus

Dayton

Oklahoma

Oklahoma City

Tulsa

Oregon

Portland

Pennsylvania

Philadelphia

Pittsburgh

Rhode Island

Providence

South Carolina

Charleston

Columbia

Greenville

Tennessee

Knoxville

Memphis

Nashville

Texas

Austin

Dallas

El Paso

Houston

San Antonio

Utah

Salt Lake City

Virginia

Alexandria

Arlington

Norfolk

Richmond

Washington

Seattle

Tacoma

West Virginia

Charleston

Wisconsin

Madison

Milwaukee

Alberta

Calgary

Edmonton

British Columbia

Vancouver

Manitoba

Winnipeg

Nova Scotia

Halifax

Ontario

Ottawa

Toronto

Quebec

Montreal

Puerto Rico

San Juan