Question 1

What is Ruff in the Python ecosystem?

Accepted Answer

Ruff is an extremely fast Python linter and formatter written in Rust. It advantageously replaces tools like Flake8, isort, and Black while offering 10 to 100 times better performance. Ruff supports over 700 linting rules and easily integrates into CI/CD pipelines and pre-commit hooks.

Question 2

What is the main role of the pyproject.toml file with Poetry?

Accepted Answer

The pyproject.toml file is the central configuration file for a Python project with Poetry. It defines project metadata (name, version, description), production and development dependencies, scripts, and tool configurations like Ruff or pytest. This standardized file replaces setup.py, requirements.txt, and setup.cfg.

Question 3

Which Poetry command installs all dependencies of an existing project?

Accepted Answer

The poetry install command reads pyproject.toml and poetry.lock files to install all project dependencies in an isolated virtual environment. If poetry.lock exists, exact versions are used to ensure reproducibility. Otherwise, Poetry resolves dependencies and creates the lock file.

CI/CD and Code Quality

What is Ruff in the Python ecosystem?

Answer

What is the main role of the pyproject.toml file with Poetry?

Answer

Which Poetry command installs all dependencies of an existing project?

Answer

What is a pre-commit hook in the Git context?

What is the basic structure of a GitHub Actions workflow?

Other Data Engineering interview topics

Linux & Shell - Fundamentals

Git & GitHub - Fundamentals

Advanced Python for Data Engineering

Docker - Fundamentals

Google Cloud Platform - Fundamentals

Docker Compose

FastAPI - Data APIs

Advanced SQL for Data Engineering

Data Lake - Architecture and Ingestion

BigQuery for Data Engineering

PostgreSQL - Administration

Data Modeling for Data Engineering

Fivetran & Airbyte - Data Ingestion

dbt - Fundamentals

Apache Airflow - Fundamentals

Kubernetes - Fundamentals

dbt - Advanced Features

ETL / ELT / ETLT Patterns

Apache Airflow - Advanced

Airflow + dbt - Pipeline Orchestration

PySpark - Large-Scale Processing

Google Pub/Sub - Data Streaming

Apache Beam & Dataflow

Kubernetes - Production and Scaling

Terraform - Infrastructure as Code

NoSQL Databases

Modern Data Architecture

Monitoring and Observability

IAM and Data Security

Master Data Engineering for your next interview