
Apache Airflow - Fundamentals
DAGs, operators (Bash, Python, SQL), scheduling, task dependencies, Airflow UI, connections, variables, trigger rules
1What is a DAG in Apache Airflow?
What is a DAG in Apache Airflow?
Answer
A DAG (Directed Acyclic Graph) is a collection of tasks organized with dependencies and relationships, representing a complete workflow. The acyclic nature means there cannot be loops in the dependency graph, which ensures each task is executed exactly once per run. The DAG defines when and how tasks should run, but not what they concretely do.
2Which DAG parameter defines the date from which the scheduler starts scheduling runs?
Which DAG parameter defines the date from which the scheduler starts scheduling runs?
Answer
The start_date parameter defines the date from which Airflow starts scheduling DAG runs. This date is used in combination with schedule_interval to determine data intervals. An important point: if start_date is in the past, Airflow may trigger backfills to catch up on missed runs, unless catchup=False is configured.
3Which operator should be used to execute a Python function in an Airflow DAG?
Which operator should be used to execute a Python function in an Airflow DAG?
Answer
The PythonOperator allows executing a Python callable function in an Airflow DAG. The function is passed via the python_callable parameter and can receive arguments via op_args (list) or op_kwargs (dictionary). The PythonOperator is one of the most commonly used operators because it offers great flexibility for running custom Python code.
How to define a dependency between two tasks task_a and task_b so that task_b runs after task_a?
Which cron expression represents a daily execution at midnight?
+17 interview questions
Other Data Engineering interview topics
Linux & Shell - Fundamentals
Git & GitHub - Fundamentals
Advanced Python for Data Engineering
Docker - Fundamentals
Google Cloud Platform - Fundamentals
CI/CD and Code Quality
Docker Compose
FastAPI - Data APIs
Advanced SQL for Data Engineering
Data Lake - Architecture and Ingestion
BigQuery for Data Engineering
PostgreSQL - Administration
Data Modeling for Data Engineering
Fivetran & Airbyte - Data Ingestion
dbt - Fundamentals
Kubernetes - Fundamentals
dbt - Advanced Features
ETL / ELT / ETLT Patterns
Apache Airflow - Advanced
Airflow + dbt - Pipeline Orchestration
PySpark - Large-Scale Processing
Google Pub/Sub - Data Streaming
Apache Beam & Dataflow
Kubernetes - Production and Scaling
Terraform - Infrastructure as Code
NoSQL Databases
Modern Data Architecture
Monitoring and Observability
IAM and Data Security
Master Data Engineering for your next interview
Access all questions, flashcards, technical tests and interview simulators.
Start for free