Question 1

What is the main difference between ETL and ELT?

Accepted Answer

In ETL (Extract-Transform-Load), data is transformed on an intermediate server before being loaded into the destination. In ELT (Extract-Load-Transform), raw data is first loaded into the destination (typically a cloud data warehouse), then transformed directly within it using its compute power. ELT has become popular with cloud data warehouses like BigQuery, Snowflake or Redshift that offer elastic compute power.

Question 2

What is the main advantage of the ELT approach compared to traditional ETL?

Accepted Answer

The ELT approach leverages the elastic compute power of modern cloud data warehouses (BigQuery, Snowflake, Redshift). Instead of maintaining separate transformation infrastructure that can become a bottleneck, transformations directly use the data warehouse's scaling capabilities. This reduces operational complexity and enables processing much larger data volumes without manual resource provisioning.

Question 3

What is the ETLT pattern and when is it relevant?

Accepted Answer

ETLT combines both approaches: a first light transformation is performed during extraction (cleaning, filtering, anonymization), then data is loaded and more complex transformations are applied in the data warehouse. This pattern is useful when certain transformations must be done upstream for compliance reasons (masking sensitive data before loading), volume reduction (early filtering), or normalizing heterogeneous source formats.

ETL / ELT / ETLT Patterns

What is the main difference between ETL and ELT?

Answer

What is the main advantage of the ELT approach compared to traditional ETL?

Answer

What is the ETLT pattern and when is it relevant?

Answer

What is idempotence in the context of data pipelines?

How to implement idempotence when loading data into a table?

Other Data Engineering interview topics

Linux & Shell - Fundamentals

Git & GitHub - Fundamentals

Advanced Python for Data Engineering

Docker - Fundamentals

Google Cloud Platform - Fundamentals

CI/CD and Code Quality

Docker Compose

FastAPI - Data APIs

Advanced SQL for Data Engineering

Data Lake - Architecture and Ingestion

BigQuery for Data Engineering

PostgreSQL - Administration

Data Modeling for Data Engineering

Fivetran & Airbyte - Data Ingestion

dbt - Fundamentals

Apache Airflow - Fundamentals

Kubernetes - Fundamentals

dbt - Advanced Features

Apache Airflow - Advanced

Airflow + dbt - Pipeline Orchestration

PySpark - Large-Scale Processing

Google Pub/Sub - Data Streaming

Apache Beam & Dataflow

Kubernetes - Production and Scaling

Terraform - Infrastructure as Code

NoSQL Databases

Modern Data Architecture

Monitoring and Observability

IAM and Data Security

Master Data Engineering for your next interview