Data Engineering

BigQuery for Data Engineering

Serverless architecture, partitioning, clustering, costs, UDFs, federated queries, scheduled queries, materialized views

20 interview questionsยท
Mid-Level
1

What storage architecture does BigQuery use?

Answer

BigQuery uses a serverless architecture with columnar storage called Capacitor. This architecture separates storage and compute, enabling independent scaling and separate billing. Columnar storage is optimized for analytical queries as it allows reading only the required columns, significantly reducing I/O.

2

What is the main advantage of table partitioning in BigQuery?

Answer

Partitioning divides a large table into smaller segments based on a column (usually a date). During queries, BigQuery can skip irrelevant partitions (partition pruning), reducing the amount of data scanned. This improves performance and reduces costs since BigQuery charges based on the volume of data processed.

3

What types of partitioning are available in BigQuery?

Answer

BigQuery supports three partitioning types: by DATE or TIMESTAMP column (most common), by integer range (INTEGER RANGE), and by ingestion time (_PARTITIONTIME). Date partitioning is recommended for time-series data as it enables efficient partition pruning on date filters.

4

What is the difference between partitioning and clustering in BigQuery?

5

How to optimize query costs in BigQuery?

+17 interview questions

Master Data Engineering for your next interview

Access all questions, flashcards, technical tests, code review exercises and interview simulators.

Start for free