
Apache Airflow - Advanced
Sensors, XCom, TaskFlow API, pools, priority, dynamic DAGs, KubernetesPodOperator, monitoring
1What is the main role of a Sensor in Apache Airflow?
What is the main role of a Sensor in Apache Airflow?
Answer
A Sensor is a special operator that waits for a condition to be met before continuing DAG execution. It periodically checks (pokes) whether the condition is satisfied, such as file arrival, partition availability, or another task's state. Sensors are essential for orchestrating workflows dependent on external events.
2What is the difference between 'poke' and 'reschedule' modes for a Sensor?
What is the difference between 'poke' and 'reschedule' modes for a Sensor?
Answer
In poke mode, the Sensor continuously occupies a worker slot and checks the condition at regular intervals (poke_interval). In reschedule mode, the Sensor releases the worker slot between checks and reschedules itself. Reschedule mode is recommended for long-running conditions as it frees resources for other tasks.
3Which Sensor should be used to wait for a Hive partition to be available?
Which Sensor should be used to wait for a Hive partition to be available?
Answer
HivePartitionSensor checks for the existence of a specific partition in a Hive table. It is commonly used in data pipelines to ensure source data is available before running transformations. It accepts parameters like schema, table, and partition to verify.
How to pass data between two Airflow tasks?
What is the recommended maximum size for data stored in XCom?
+17 interview questions
Other Data Engineering interview topics
Linux & Shell - Fundamentals
Git & GitHub - Fundamentals
Advanced Python for Data Engineering
Docker - Fundamentals
Google Cloud Platform - Fundamentals
CI/CD and Code Quality
Docker Compose
FastAPI - Data APIs
Advanced SQL for Data Engineering
Data Lake - Architecture and Ingestion
BigQuery for Data Engineering
PostgreSQL - Administration
Data Modeling for Data Engineering
Fivetran & Airbyte - Data Ingestion
dbt - Fundamentals
Apache Airflow - Fundamentals
Kubernetes - Fundamentals
dbt - Advanced Features
ETL / ELT / ETLT Patterns
Airflow + dbt - Pipeline Orchestration
PySpark - Large-Scale Processing
Google Pub/Sub - Data Streaming
Apache Beam & Dataflow
Kubernetes - Production and Scaling
Terraform - Infrastructure as Code
NoSQL Databases
Modern Data Architecture
Monitoring and Observability
IAM and Data Security
Master Data Engineering for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free