Data Engineering

ETL / ELT / ETLT Patterns

ETL vs ELT vs ETLT, batch vs micro-batch vs streaming, idempotence, error handling, dead letter queues, data quality, lineage

20 interview questions·
Senior
1

What is the main difference between ETL and ELT?

Answer

In ETL (Extract-Transform-Load), data is transformed on an intermediate server before being loaded into the destination. In ELT (Extract-Load-Transform), raw data is first loaded into the destination (typically a cloud data warehouse), then transformed directly within it using its compute power. ELT has become popular with cloud data warehouses like BigQuery, Snowflake or Redshift that offer elastic compute power.

2

What is the main advantage of the ELT approach compared to traditional ETL?

Answer

The ELT approach leverages the elastic compute power of modern cloud data warehouses (BigQuery, Snowflake, Redshift). Instead of maintaining separate transformation infrastructure that can become a bottleneck, transformations directly use the data warehouse's scaling capabilities. This reduces operational complexity and enables processing much larger data volumes without manual resource provisioning.

3

What is the ETLT pattern and when is it relevant?

Answer

ETLT combines both approaches: a first light transformation is performed during extraction (cleaning, filtering, anonymization), then data is loaded and more complex transformations are applied in the data warehouse. This pattern is useful when certain transformations must be done upstream for compliance reasons (masking sensitive data before loading), volume reduction (early filtering), or normalizing heterogeneous source formats.

4

What is idempotence in the context of data pipelines?

5

How to implement idempotence when loading data into a table?

+17 interview questions

Master Data Engineering for your next interview

Access all questions, flashcards, technical tests and interview simulators.

Start for free