Question 1

What is a star schema?

Accepted Answer

A star schema is a dimensional data model where a central fact table is surrounded by dimension tables. The fact table contains metrics and foreign keys to dimensions, which hold descriptive attributes. This simple, denormalized structure optimizes analytical query performance by minimizing joins.

Question 2

What is the difference between a fact table and a dimension table?

Accepted Answer

A fact table contains quantitative measures (metrics) and foreign keys to dimensions. It records events or transactions. A dimension table contains descriptive attributes (who, what, where, when) enabling filtering and grouping of facts. Facts are numeric and aggregatable, dimensions are textual and descriptive.

Question 3

What is the grain of a fact table?

Accepted Answer

The grain defines the level of detail of a row in the fact table. It answers the question: what exactly does one row represent? For example, one sale per row, one sale per day per product, or one sale per hour. Defining the grain is the first step in dimensional modeling as it determines which dimensions are needed and what level of aggregation is stored.

Data Modeling for Data Engineering

What is a star schema?

Answer

What is the difference between a fact table and a dimension table?

Answer

What is the grain of a fact table?

Answer

What is the difference between a star schema and a snowflake schema?

What is a conformed dimension?

Other Data Engineering interview topics

Linux & Shell - Fundamentals

Git & GitHub - Fundamentals

Advanced Python for Data Engineering

Docker - Fundamentals

Google Cloud Platform - Fundamentals

CI/CD and Code Quality

Docker Compose

FastAPI - Data APIs

Advanced SQL for Data Engineering

Data Lake - Architecture and Ingestion

BigQuery for Data Engineering

PostgreSQL - Administration

Fivetran & Airbyte - Data Ingestion

dbt - Fundamentals

Apache Airflow - Fundamentals

Kubernetes - Fundamentals

dbt - Advanced Features

ETL / ELT / ETLT Patterns

Apache Airflow - Advanced

Airflow + dbt - Pipeline Orchestration

PySpark - Large-Scale Processing

Google Pub/Sub - Data Streaming

Apache Beam & Dataflow

Kubernetes - Production and Scaling

Terraform - Infrastructure as Code

NoSQL Databases

Modern Data Architecture

Monitoring and Observability

IAM and Data Security

Master Data Engineering for your next interview