
Fivetran & Airbyte - Data Ingestion
Connectors, sync modes (full, incremental), CDC, schema evolution, transformations, monitoring
1What is the main difference between Fivetran and Airbyte in terms of deployment model?
What is the main difference between Fivetran and Airbyte in terms of deployment model?
Answer
Fivetran is a fully managed SaaS solution where the infrastructure is managed by Fivetran, while Airbyte offers an open-source self-hosted model in addition to a cloud offering. Airbyte allows deploying the solution on your own infrastructure (Docker, Kubernetes), providing more control over data and costs, whereas Fivetran simplifies operations by managing all maintenance.
2What is a connector in the context of Fivetran or Airbyte?
What is a connector in the context of Fivetran or Airbyte?
Answer
A connector is a preconfigured component that extracts data from a specific source (database, API, SaaS) to a destination (data warehouse, data lake). Each connector handles authentication, pagination, error handling, and schema mapping for a given source, avoiding the need to write custom integration code.
3What is the difference between Full Refresh and Incremental sync?
What is the difference between Full Refresh and Incremental sync?
Answer
Full Refresh extracts all data from the source on each sync and replaces existing data in the destination. Incremental only transfers new data or changes since the last sync, using a cursor (timestamp, auto-increment ID). Incremental is more efficient in terms of time, costs, and load on the source.
What is CDC (Change Data Capture) and why is it used in ingestion tools?
What main advantage does CDC provide compared to timestamp-based incremental sync?
+17 interview questions
Other Data Engineering interview topics
Linux & Shell - Fundamentals
Git & GitHub - Fundamentals
Advanced Python for Data Engineering
Docker - Fundamentals
Google Cloud Platform - Fundamentals
CI/CD and Code Quality
Docker Compose
FastAPI - Data APIs
Advanced SQL for Data Engineering
Data Lake - Architecture and Ingestion
BigQuery for Data Engineering
PostgreSQL - Administration
Data Modeling for Data Engineering
dbt - Fundamentals
Apache Airflow - Fundamentals
Kubernetes - Fundamentals
dbt - Advanced Features
ETL / ELT / ETLT Patterns
Apache Airflow - Advanced
Airflow + dbt - Pipeline Orchestration
PySpark - Large-Scale Processing
Google Pub/Sub - Data Streaming
Apache Beam & Dataflow
Kubernetes - Production and Scaling
Terraform - Infrastructure as Code
NoSQL Databases
Modern Data Architecture
Monitoring and Observability
IAM and Data Security
Master Data Engineering for your next interview
Access all questions, flashcards, technical tests and interview simulators.
Start for free