
ETL / ELT / ETLT Patterns
ETL vs ELT vs ETLT, batch vs micro-batch vs streaming, idempotence, error handling, dead letter queues, data quality, lineage
1What is the main difference between ETL and ELT?
What is the main difference between ETL and ELT?
Answer
In ETL (Extract-Transform-Load), data is transformed on an intermediate server before being loaded into the destination. In ELT (Extract-Load-Transform), raw data is first loaded into the destination (typically a cloud data warehouse), then transformed directly within it using its compute power. ELT has become popular with cloud data warehouses like BigQuery, Snowflake or Redshift that offer elastic compute power.
2What is the main advantage of the ELT approach compared to traditional ETL?
What is the main advantage of the ELT approach compared to traditional ETL?
Answer
The ELT approach leverages the elastic compute power of modern cloud data warehouses (BigQuery, Snowflake, Redshift). Instead of maintaining separate transformation infrastructure that can become a bottleneck, transformations directly use the data warehouse's scaling capabilities. This reduces operational complexity and enables processing much larger data volumes without manual resource provisioning.
3What is the ETLT pattern and when is it relevant?
What is the ETLT pattern and when is it relevant?
Answer
ETLT combines both approaches: a first light transformation is performed during extraction (cleaning, filtering, anonymization), then data is loaded and more complex transformations are applied in the data warehouse. This pattern is useful when certain transformations must be done upstream for compliance reasons (masking sensitive data before loading), volume reduction (early filtering), or normalizing heterogeneous source formats.
What is idempotence in the context of data pipelines?
How to implement idempotence when loading data into a table?
+17 interview questions
Other Data Engineering interview topics
Linux & Shell - Fundamentals
Git & GitHub - Fundamentals
Advanced Python for Data Engineering
Docker - Fundamentals
Google Cloud Platform - Fundamentals
CI/CD and Code Quality
Docker Compose
FastAPI - Data APIs
Advanced SQL for Data Engineering
Data Lake - Architecture and Ingestion
BigQuery for Data Engineering
PostgreSQL - Administration
Data Modeling for Data Engineering
Fivetran & Airbyte - Data Ingestion
dbt - Fundamentals
Apache Airflow - Fundamentals
Kubernetes - Fundamentals
dbt - Advanced Features
Apache Airflow - Advanced
Airflow + dbt - Pipeline Orchestration
PySpark - Large-Scale Processing
Google Pub/Sub - Data Streaming
Apache Beam & Dataflow
Kubernetes - Production and Scaling
Terraform - Infrastructure as Code
NoSQL Databases
Modern Data Architecture
Monitoring and Observability
IAM and Data Security
Master Data Engineering for your next interview
Access all questions, flashcards, technical tests and interview simulators.
Start for free