
Data Modeling for Data Engineering
Star schema, snowflake, Data Vault, normalization, SCD types, grain, additive metrics
1What is a star schema?
What is a star schema?
Answer
A star schema is a dimensional data model where a central fact table is surrounded by dimension tables. The fact table contains metrics and foreign keys to dimensions, which hold descriptive attributes. This simple, denormalized structure optimizes analytical query performance by minimizing joins.
2What is the difference between a fact table and a dimension table?
What is the difference between a fact table and a dimension table?
Answer
A fact table contains quantitative measures (metrics) and foreign keys to dimensions. It records events or transactions. A dimension table contains descriptive attributes (who, what, where, when) enabling filtering and grouping of facts. Facts are numeric and aggregatable, dimensions are textual and descriptive.
3What is the grain of a fact table?
What is the grain of a fact table?
Answer
The grain defines the level of detail of a row in the fact table. It answers the question: what exactly does one row represent? For example, one sale per row, one sale per day per product, or one sale per hour. Defining the grain is the first step in dimensional modeling as it determines which dimensions are needed and what level of aggregation is stored.
What is the difference between a star schema and a snowflake schema?
What is a conformed dimension?
+17 interview questions
Other Data Engineering interview topics
Linux & Shell - Fundamentals
Git & GitHub - Fundamentals
Advanced Python for Data Engineering
Docker - Fundamentals
Google Cloud Platform - Fundamentals
CI/CD and Code Quality
Docker Compose
FastAPI - Data APIs
Advanced SQL for Data Engineering
Data Lake - Architecture and Ingestion
BigQuery for Data Engineering
PostgreSQL - Administration
Fivetran & Airbyte - Data Ingestion
dbt - Fundamentals
Apache Airflow - Fundamentals
Kubernetes - Fundamentals
dbt - Advanced Features
ETL / ELT / ETLT Patterns
Apache Airflow - Advanced
Airflow + dbt - Pipeline Orchestration
PySpark - Large-Scale Processing
Google Pub/Sub - Data Streaming
Apache Beam & Dataflow
Kubernetes - Production and Scaling
Terraform - Infrastructure as Code
NoSQL Databases
Modern Data Architecture
Monitoring and Observability
IAM and Data Security
Master Data Engineering for your next interview
Access all questions, flashcards, technical tests and interview simulators.
Start for free