Data Engineering

Apache Airflow - Advanced

Sensors, XCom, TaskFlow API, pools, priority, dynamic DAGs, KubernetesPodOperator, monitoring

20 interview questionsยท
Senior
1

What is the main role of a Sensor in Apache Airflow?

Answer

A Sensor is a special operator that waits for a condition to be met before continuing DAG execution. It periodically checks (pokes) whether the condition is satisfied, such as file arrival, partition availability, or another task's state. Sensors are essential for orchestrating workflows dependent on external events.

2

What is the difference between 'poke' and 'reschedule' modes for a Sensor?

Answer

In poke mode, the Sensor continuously occupies a worker slot and checks the condition at regular intervals (poke_interval). In reschedule mode, the Sensor releases the worker slot between checks and reschedules itself. Reschedule mode is recommended for long-running conditions as it frees resources for other tasks.

3

Which Sensor should be used to wait for a Hive partition to be available?

Answer

HivePartitionSensor checks for the existence of a specific partition in a Hive table. It is commonly used in data pipelines to ensure source data is available before running transformations. It accepts parameters like schema, table, and partition to verify.

4

How to pass data between two Airflow tasks?

5

What is the recommended maximum size for data stored in XCom?

+17 interview questions

Master Data Engineering for your next interview

Access all questions, flashcards, technical tests, code review exercises and interview simulators.

Start for free