Data Engineering

Apache Airflow - Fundamentals

DAGs, operators (Bash, Python, SQL), scheduling, task dependencies, Airflow UI, connections, variables, trigger rules

20 interview questions·
Mid-Level
1

What is a DAG in Apache Airflow?

Answer

A DAG (Directed Acyclic Graph) is a collection of tasks organized with dependencies and relationships, representing a complete workflow. The acyclic nature means there cannot be loops in the dependency graph, which ensures each task is executed exactly once per run. The DAG defines when and how tasks should run, but not what they concretely do.

2

Which DAG parameter defines the date from which the scheduler starts scheduling runs?

Answer

The start_date parameter defines the date from which Airflow starts scheduling DAG runs. This date is used in combination with schedule_interval to determine data intervals. An important point: if start_date is in the past, Airflow may trigger backfills to catch up on missed runs, unless catchup=False is configured.

3

Which operator should be used to execute a Python function in an Airflow DAG?

Answer

The PythonOperator allows executing a Python callable function in an Airflow DAG. The function is passed via the python_callable parameter and can receive arguments via op_args (list) or op_kwargs (dictionary). The PythonOperator is one of the most commonly used operators because it offers great flexibility for running custom Python code.

4

How to define a dependency between two tasks task_a and task_b so that task_b runs after task_a?

5

Which cron expression represents a daily execution at midnight?

+17 interview questions

Master Data Engineering for your next interview

Access all questions, flashcards, technical tests and interview simulators.

Start for free
Apache Airflow - Fundamentals - Data Engineering Interview Questions | SharpSkill