
Modern Data Architecture
Data Lake vs Data Warehouse vs Lakehouse, Data Mesh, Data Contracts, schema registry, ADR, governance, data catalog, lineage
1What is the fundamental difference between a Data Lake and a Data Warehouse?
What is the fundamental difference between a Data Lake and a Data Warehouse?
Answer
A Data Lake stores data in its native (raw) format with schema applied at read time (schema-on-read), allowing great flexibility for exploration. A Data Warehouse enforces a structured schema at write time (schema-on-write) with transformed data optimized for analytics. Data Lakes favor flexibility and massive low-cost storage, while Data Warehouses favor query performance and data quality.
2What is the main advantage of Lakehouse architecture compared to separate Data Lake and Data Warehouse architectures?
What is the main advantage of Lakehouse architecture compared to separate Data Lake and Data Warehouse architectures?
Answer
Lakehouse architecture combines the best of both worlds: the flexible and cost-effective storage of Data Lake with ACID capabilities, query performance, and governance of Data Warehouse. This eliminates data duplication between systems, reduces synchronization costs and complexity, while enabling BI and ML workloads on the same platform using open formats like Delta Lake, Iceberg, or Hudi.
3Which open table format enables ACID transactions on a Data Lake?
Which open table format enables ACID transactions on a Data Lake?
Answer
Delta Lake, Apache Iceberg, and Apache Hudi are the three main open table formats enabling ACID transactions on a Data Lake. Delta Lake, developed by Databricks, uses a transaction log to guarantee atomicity and consistency. Iceberg, created by Netflix, offers advanced partition management and schema evolution. Hudi, developed by Uber, excels in upsert and CDC scenarios. These formats transform simple object storage into a Lakehouse with transactional guarantees.
What is the fundamental principle of Data Mesh?
What is a Data Contract in the context of Data Mesh?
+17 interview questions
Other Data Engineering interview topics
Linux & Shell - Fundamentals
Git & GitHub - Fundamentals
Advanced Python for Data Engineering
Docker - Fundamentals
Google Cloud Platform - Fundamentals
CI/CD and Code Quality
Docker Compose
FastAPI - Data APIs
Advanced SQL for Data Engineering
Data Lake - Architecture and Ingestion
BigQuery for Data Engineering
PostgreSQL - Administration
Data Modeling for Data Engineering
Fivetran & Airbyte - Data Ingestion
dbt - Fundamentals
Apache Airflow - Fundamentals
Kubernetes - Fundamentals
dbt - Advanced Features
ETL / ELT / ETLT Patterns
Apache Airflow - Advanced
Airflow + dbt - Pipeline Orchestration
PySpark - Large-Scale Processing
Google Pub/Sub - Data Streaming
Apache Beam & Dataflow
Kubernetes - Production and Scaling
Terraform - Infrastructure as Code
NoSQL Databases
Monitoring and Observability
IAM and Data Security
Master Data Engineering for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free