
Advanced SQL for Data Engineering
Window functions, recursive CTEs, analytical queries, optimization, EXPLAIN ANALYZE, indexing, partitioning
1Which window function assigns a unique sequential number to each row within a partition?
Which window function assigns a unique sequential number to each row within a partition?
Answer
ROW_NUMBER() assigns a unique sequential number to each row within a partition, starting at 1. Unlike RANK() and DENSE_RANK() which can assign the same number to equal values, ROW_NUMBER() always guarantees unique numbers, even for identical values in the ORDER BY clause.
2What is the main difference between RANK() and DENSE_RANK()?
What is the main difference between RANK() and DENSE_RANK()?
Answer
RANK() leaves gaps in the numbering after tied values. For example, if two rows are ranked 1, the next one will be ranked 3. DENSE_RANK() doesn't leave gaps: after two rows ranked 1, the next will be ranked 2. This distinction is important for ranking analyses where gaps can skew statistics.
3How to calculate a running total (cumulative sum) with a window function?
How to calculate a running total (cumulative sum) with a window function?
Answer
SUM() OVER with ORDER BY and no explicit frame uses ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW by default, which calculates the sum of all preceding rows up to and including the current row. This is the standard behavior for creating running totals in time-series or sequential analyses.
What is the role of the PARTITION BY clause in a window function?
Which function allows accessing the value of the previous row in a window function?
+17 interview questions
Other Data Engineering interview topics
Linux & Shell - Fundamentals
Git & GitHub - Fundamentals
Advanced Python for Data Engineering
Docker - Fundamentals
Google Cloud Platform - Fundamentals
CI/CD and Code Quality
Docker Compose
FastAPI - Data APIs
Data Lake - Architecture and Ingestion
BigQuery for Data Engineering
PostgreSQL - Administration
Data Modeling for Data Engineering
Fivetran & Airbyte - Data Ingestion
dbt - Fundamentals
Apache Airflow - Fundamentals
Kubernetes - Fundamentals
dbt - Advanced Features
ETL / ELT / ETLT Patterns
Apache Airflow - Advanced
Airflow + dbt - Pipeline Orchestration
PySpark - Large-Scale Processing
Google Pub/Sub - Data Streaming
Apache Beam & Dataflow
Kubernetes - Production and Scaling
Terraform - Infrastructure as Code
NoSQL Databases
Modern Data Architecture
Monitoring and Observability
IAM and Data Security
Master Data Engineering for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free