Question 1

Which window function assigns a unique sequential number to each row within a partition?

Accepted Answer

ROW_NUMBER() assigns a unique sequential number to each row within a partition, starting at 1. Unlike RANK() and DENSE_RANK() which can assign the same number to equal values, ROW_NUMBER() always guarantees unique numbers, even for identical values in the ORDER BY clause.

Question 2

What is the main difference between RANK() and DENSE_RANK()?

Accepted Answer

RANK() leaves gaps in the numbering after tied values. For example, if two rows are ranked 1, the next one will be ranked 3. DENSE_RANK() doesn't leave gaps: after two rows ranked 1, the next will be ranked 2. This distinction is important for ranking analyses where gaps can skew statistics.

Question 3

How to calculate a running total (cumulative sum) with a window function?

Accepted Answer

SUM() OVER with ORDER BY and no explicit frame uses ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW by default, which calculates the sum of all preceding rows up to and including the current row. This is the standard behavior for creating running totals in time-series or sequential analyses.

Advanced SQL for Data Engineering

Which window function assigns a unique sequential number to each row within a partition?

Answer

What is the main difference between RANK() and DENSE_RANK()?

Answer

How to calculate a running total (cumulative sum) with a window function?

Answer

What is the role of the PARTITION BY clause in a window function?

Which function allows accessing the value of the previous row in a window function?

Other Data Engineering interview topics

Linux & Shell - Fundamentals

Git & GitHub - Fundamentals

Advanced Python for Data Engineering

Docker - Fundamentals

Google Cloud Platform - Fundamentals

CI/CD and Code Quality

Docker Compose

FastAPI - Data APIs

Data Lake - Architecture and Ingestion

BigQuery for Data Engineering

PostgreSQL - Administration

Data Modeling for Data Engineering

Fivetran & Airbyte - Data Ingestion

dbt - Fundamentals

Apache Airflow - Fundamentals

Kubernetes - Fundamentals

dbt - Advanced Features

ETL / ELT / ETLT Patterns

Apache Airflow - Advanced

Airflow + dbt - Pipeline Orchestration

PySpark - Large-Scale Processing

Google Pub/Sub - Data Streaming

Apache Beam & Dataflow

Kubernetes - Production and Scaling

Terraform - Infrastructure as Code

NoSQL Databases

Modern Data Architecture

Monitoring and Observability

IAM and Data Security

Master Data Engineering for your next interview