Data Engineering

Fivetran & Airbyte - ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

์ปค๋„ฅํ„ฐ, sync modes (full, incremental), CDC, schema evolution, ๋ณ€ํ™˜, ๋ชจ๋‹ˆํ„ฐ๋ง

20 ๋ฉด์ ‘ ์งˆ๋ฌธยท
Mid-Level
1

๋ฐฐํฌ ๋ชจ๋ธ ๊ด€์ ์—์„œ Fivetran๊ณผ Airbyte์˜ ์ฃผ์š” ์ฐจ์ด์ ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

๋‹ต๋ณ€

Fivetran์€ ์ธํ”„๋ผ๊ฐ€ Fivetran์— ์˜ํ•ด ๊ด€๋ฆฌ๋˜๋Š” ์™„์ „ ๊ด€๋ฆฌํ˜• SaaS ์†”๋ฃจ์…˜์ด๋ฉฐ, Airbyte๋Š” cloud ์ œ๊ณต ์™ธ์—๋„ open-source self-hosted ๋ชจ๋ธ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Airbyte๋Š” ์ž์ฒด ์ธํ”„๋ผ(Docker, Kubernetes)์— ์†”๋ฃจ์…˜์„ ๋ฐฐํฌํ•  ์ˆ˜ ์žˆ์–ด ๋ฐ์ดํ„ฐ์™€ ๋น„์šฉ์— ๋Œ€ํ•œ ๋” ๋งŽ์€ ์ œ์–ด๋ฅผ ์ œ๊ณตํ•˜๋Š” ๋ฐ˜๋ฉด, Fivetran์€ ๋ชจ๋“  ์œ ์ง€ ๊ด€๋ฆฌ๋ฅผ ์ฒ˜๋ฆฌํ•˜์—ฌ ์šด์˜์„ ๋‹จ์ˆœํ™”ํ•ฉ๋‹ˆ๋‹ค.

2

Fivetran ๋˜๋Š” Airbyte์˜ ๋งฅ๋ฝ์—์„œ connector๋ž€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

๋‹ต๋ณ€

Connector๋Š” ํŠน์ • ์†Œ์Šค(๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค, API, SaaS)์—์„œ ๋Œ€์ƒ(data warehouse, data lake)์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๋Š” ์‚ฌ์ „ ๊ตฌ์„ฑ๋œ ์ปดํฌ๋„ŒํŠธ์ž…๋‹ˆ๋‹ค. ๊ฐ connector๋Š” ์ฃผ์–ด์ง„ ์†Œ์Šค์— ๋Œ€ํ•œ ์ธ์ฆ, ํŽ˜์ด์ง€๋„ค์ด์…˜, ์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ ๋ฐ ์Šคํ‚ค๋งˆ ๋งคํ•‘์„ ์ฒ˜๋ฆฌํ•˜์—ฌ ์‚ฌ์šฉ์ž ์ •์˜ ํ†ตํ•ฉ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•  ํ•„์š”๋ฅผ ์—†์•ฑ๋‹ˆ๋‹ค.

3

Full Refresh์™€ Incremental ๋™๊ธฐํ™”์˜ ์ฐจ์ด์ ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

๋‹ต๋ณ€

Full Refresh๋Š” ๋งค ๋™๊ธฐํ™” ์‹œ ์†Œ์Šค์—์„œ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๊ณ  ๋Œ€์ƒ์˜ ๊ธฐ์กด ๋ฐ์ดํ„ฐ๋ฅผ ๊ต์ฒดํ•ฉ๋‹ˆ๋‹ค. Incremental์€ ์ปค์„œ(timestamp, ์ž๋™ ์ฆ๊ฐ€ ID)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งˆ์ง€๋ง‰ ๋™๊ธฐํ™” ์ดํ›„์˜ ์ƒˆ ๋ฐ์ดํ„ฐ ๋˜๋Š” ๋ณ€๊ฒฝ ์‚ฌํ•ญ๋งŒ ์ „์†กํ•ฉ๋‹ˆ๋‹ค. Incremental์€ ์‹œ๊ฐ„, ๋น„์šฉ ๋ฐ ์†Œ์Šค ๋ถ€ํ•˜ ์ธก๋ฉด์—์„œ ๋” ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.

4

CDC (Change Data Capture)๋ž€ ๋ฌด์—‡์ด๋ฉฐ ์ˆ˜์ง‘ ๋„๊ตฌ์—์„œ ์™œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๊นŒ?

5

CDC๊ฐ€ timestamp ๊ธฐ๋ฐ˜ ์ฆ๋ถ„ sync์— ๋น„ํ•ด ์ œ๊ณตํ•˜๋Š” ์ฃผ์š” ์ด์ ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

+17 ๋ฉด์ ‘ ์งˆ๋ฌธ

๊ธฐํƒ€ Data Engineering ๋ฉด์ ‘ ์ฃผ์ œ

Linux & Shell - ๊ธฐ์ดˆ

Junior
20๊ฐœ ์งˆ๋ฌธ

Git & GitHub - ๊ธฐ์ดˆ

Junior
20๊ฐœ ์งˆ๋ฌธ

๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง์„ ์œ„ํ•œ ๊ณ ๊ธ‰ Python

Junior
25๊ฐœ ์งˆ๋ฌธ

Docker - ๊ธฐ์ดˆ

Junior
25๊ฐœ ์งˆ๋ฌธ

Google Cloud Platform - ๊ธฐ์ดˆ

Junior
20๊ฐœ ์งˆ๋ฌธ

CI/CD ๋ฐ ์ฝ”๋“œ ํ’ˆ์งˆ

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Docker Compose

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

FastAPI - ๋ฐ์ดํ„ฐ API

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Data Engineering์„ ์œ„ํ•œ ๊ณ ๊ธ‰ SQL

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Data Lake - ์•„ํ‚คํ…์ฒ˜ ๋ฐ ์ˆ˜์ง‘

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง์„ ์œ„ํ•œ BigQuery

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

PostgreSQL - ๊ด€๋ฆฌ

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Data Engineering์„ ์œ„ํ•œ Data Modeling

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

dbt - ๊ธฐ์ดˆ

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Apache Airflow - ๊ธฐ์ดˆ

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Kubernetes - ๊ธฐ์ดˆ

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

dbt - ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ

Senior
20๊ฐœ ์งˆ๋ฌธ

ETL / ELT / ETLT ํŒจํ„ด

Senior
20๊ฐœ ์งˆ๋ฌธ

Apache Airflow - ๊ณ ๊ธ‰

Senior
20๊ฐœ ์งˆ๋ฌธ

Airflow + dbt - ํŒŒ์ดํ”„๋ผ์ธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜

Senior
20๊ฐœ ์งˆ๋ฌธ

PySpark - ๋Œ€๊ทœ๋ชจ ์ฒ˜๋ฆฌ

Senior
20๊ฐœ ์งˆ๋ฌธ

Google Pub/Sub - ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆฌ๋ฐ

Senior
20๊ฐœ ์งˆ๋ฌธ

Apache Beam & Dataflow

Senior
20๊ฐœ ์งˆ๋ฌธ

Kubernetes - ํ”„๋กœ๋•์…˜ ๋ฐ ์Šค์ผ€์ผ๋ง

Senior
20๊ฐœ ์งˆ๋ฌธ

Terraform - Infrastructure as Code

Senior
20๊ฐœ ์งˆ๋ฌธ

NoSQL ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค

Senior
20๊ฐœ ์งˆ๋ฌธ

๋ชจ๋˜ Data Architecture

Senior
20๊ฐœ ์งˆ๋ฌธ

๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๊ด€์ฐฐ ๊ฐ€๋Šฅ์„ฑ

Senior
20๊ฐœ ์งˆ๋ฌธ

IAM ๋ฐ ๋ฐ์ดํ„ฐ ๋ณด์•ˆ

Senior
20๊ฐœ ์งˆ๋ฌธ

๋‹ค์Œ ๋ฉด์ ‘์„ ์œ„ํ•ด Data Engineering์„ ๋งˆ์Šคํ„ฐํ•˜์„ธ์š”

๋ชจ๋“  ์งˆ๋ฌธ, flashcards, ๊ธฐ์ˆ  ํ…Œ์ŠคํŠธ, ์ฝ”๋“œ ๋ฆฌ๋ทฐ ์—ฐ์Šต, ๋ฉด์ ‘ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์— ์ ‘๊ทผํ•˜์„ธ์š”.

๋ฌด๋ฃŒ๋กœ ์‹œ์ž‘ํ•˜๊ธฐ