Question 1

What is structured logging in the context of a data pipeline?

Accepted Answer

Structured logging means emitting logs in a parsable format (JSON, key-value) rather than free text. This allows easy filtering, searching and aggregating logs in tools like Cloud Logging, Elasticsearch or Datadog. In a data pipeline, this greatly facilitates debugging by enabling filtering by DAG, task_id, run_id or any business context.

Question 2

What is the difference between an SLI (Service Level Indicator) and an SLO (Service Level Objective)?

Accepted Answer

An SLI is a measurable metric that quantifies an aspect of service quality (e.g., job success rate, pipeline latency). An SLO is a target defined on that metric (e.g., 99.5% of jobs must succeed). The SLA is the contractual commitment to customers based on internal SLOs. This hierarchy enables objective reliability monitoring and triggering alerts before violating SLAs.

Question 3

What is an Expectation in Great Expectations?

Accepted Answer

An Expectation is a declarative assertion about data, like expect_column_values_to_not_be_null or expect_column_values_to_be_between. Great Expectations automatically generates documentation and actionable validation results. These Expectations are grouped into Suites that define the complete quality contract for a dataset.

Monitoring and Observability

What is structured logging in the context of a data pipeline?

Answer

What is the difference between an SLI (Service Level Indicator) and an SLO (Service Level Objective)?

Answer

What is an Expectation in Great Expectations?

Answer

What is the main role of Soda in a data pipeline?

What is a runbook in the context of data incident management?

Other Data Engineering interview topics

Linux & Shell - Fundamentals

Git & GitHub - Fundamentals

Advanced Python for Data Engineering

Docker - Fundamentals

Google Cloud Platform - Fundamentals

CI/CD and Code Quality

Docker Compose

FastAPI - Data APIs

Advanced SQL for Data Engineering

Data Lake - Architecture and Ingestion

BigQuery for Data Engineering

PostgreSQL - Administration

Data Modeling for Data Engineering

Fivetran & Airbyte - Data Ingestion

dbt - Fundamentals

Apache Airflow - Fundamentals

Kubernetes - Fundamentals

dbt - Advanced Features

ETL / ELT / ETLT Patterns

Apache Airflow - Advanced

Airflow + dbt - Pipeline Orchestration

PySpark - Large-Scale Processing

Google Pub/Sub - Data Streaming

Apache Beam & Dataflow

Kubernetes - Production and Scaling

Terraform - Infrastructure as Code

NoSQL Databases

Modern Data Architecture

IAM and Data Security

Master Data Engineering for your next interview