Data Engineering

Fivetran & Airbyte - Data Ingestion

Connectors, sync modes (full, incremental), CDC, schema evolution, transformations, monitoring

20 interview questions·
Mid-Level
1

What is the main difference between Fivetran and Airbyte in terms of deployment model?

Answer

Fivetran is a fully managed SaaS solution where the infrastructure is managed by Fivetran, while Airbyte offers an open-source self-hosted model in addition to a cloud offering. Airbyte allows deploying the solution on your own infrastructure (Docker, Kubernetes), providing more control over data and costs, whereas Fivetran simplifies operations by managing all maintenance.

2

What is a connector in the context of Fivetran or Airbyte?

Answer

A connector is a preconfigured component that extracts data from a specific source (database, API, SaaS) to a destination (data warehouse, data lake). Each connector handles authentication, pagination, error handling, and schema mapping for a given source, avoiding the need to write custom integration code.

3

What is the difference between Full Refresh and Incremental sync?

Answer

Full Refresh extracts all data from the source on each sync and replaces existing data in the destination. Incremental only transfers new data or changes since the last sync, using a cursor (timestamp, auto-increment ID). Incremental is more efficient in terms of time, costs, and load on the source.

4

What is CDC (Change Data Capture) and why is it used in ingestion tools?

5

What main advantage does CDC provide compared to timestamp-based incremental sync?

+17 interview questions

Master Data Engineering for your next interview

Access all questions, flashcards, technical tests and interview simulators.

Start for free