
Monitoring and Observability
Structured logging, metrics, alerting, SLA/SLO/SLI, data quality checks, Great Expectations, Soda
1What is structured logging in the context of a data pipeline?
What is structured logging in the context of a data pipeline?
Answer
Structured logging means emitting logs in a parsable format (JSON, key-value) rather than free text. This allows easy filtering, searching and aggregating logs in tools like Cloud Logging, Elasticsearch or Datadog. In a data pipeline, this greatly facilitates debugging by enabling filtering by DAG, task_id, run_id or any business context.
2What is the difference between an SLI (Service Level Indicator) and an SLO (Service Level Objective)?
What is the difference between an SLI (Service Level Indicator) and an SLO (Service Level Objective)?
Answer
An SLI is a measurable metric that quantifies an aspect of service quality (e.g., job success rate, pipeline latency). An SLO is a target defined on that metric (e.g., 99.5% of jobs must succeed). The SLA is the contractual commitment to customers based on internal SLOs. This hierarchy enables objective reliability monitoring and triggering alerts before violating SLAs.
3What is an Expectation in Great Expectations?
What is an Expectation in Great Expectations?
Answer
An Expectation is a declarative assertion about data, like expect_column_values_to_not_be_null or expect_column_values_to_be_between. Great Expectations automatically generates documentation and actionable validation results. These Expectations are grouped into Suites that define the complete quality contract for a dataset.
What is the main role of Soda in a data pipeline?
What is a runbook in the context of data incident management?
+17 interview questions
Other Data Engineering interview topics
Linux & Shell - Fundamentals
Git & GitHub - Fundamentals
Advanced Python for Data Engineering
Docker - Fundamentals
Google Cloud Platform - Fundamentals
CI/CD and Code Quality
Docker Compose
FastAPI - Data APIs
Advanced SQL for Data Engineering
Data Lake - Architecture and Ingestion
BigQuery for Data Engineering
PostgreSQL - Administration
Data Modeling for Data Engineering
Fivetran & Airbyte - Data Ingestion
dbt - Fundamentals
Apache Airflow - Fundamentals
Kubernetes - Fundamentals
dbt - Advanced Features
ETL / ELT / ETLT Patterns
Apache Airflow - Advanced
Airflow + dbt - Pipeline Orchestration
PySpark - Large-Scale Processing
Google Pub/Sub - Data Streaming
Apache Beam & Dataflow
Kubernetes - Production and Scaling
Terraform - Infrastructure as Code
NoSQL Databases
Modern Data Architecture
IAM and Data Security
Master Data Engineering for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free