Data Engineering

Data Engineering

DATA

๋ฐ์ดํ„ฐ ์ƒ์‚ฐ ์ฒด์ธ ์ „์ฒด๋ฅผ ๋‹ค๋ฃจ๋Š” ํฌ๊ด„์ ์ธ Data Engineering ์ปค๋ฆฌํ˜๋Ÿผ์ž…๋‹ˆ๋‹ค. Docker์™€ GCP๋ฅผ ํ™œ์šฉํ•œ ํ™˜๊ฒฝ ์„ค์ •๋ถ€ํ„ฐ Airflow์™€ dbt๋ฅผ ํ™œ์šฉํ•œ ํŒŒ์ดํ”„๋ผ์ธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜, BigQuery์™€ PostgreSQL์„ ํ™œ์šฉํ•œ Data Warehouse ๊ตฌ์ถ•๊นŒ์ง€ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. PySpark, Pub/Sub, Apache Beam์„ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆฌ๋ฐ๊ณผ Kubernetes, Terraform์„ ํ™œ์šฉํ•œ ํ”„๋กœ๋•์…˜ ๋ฐฐํฌ๋ฅผ ๋งˆ์Šคํ„ฐํ•ฉ๋‹ˆ๋‹ค. CI/CD, ๋ชจ๋‹ˆํ„ฐ๋ง, ํ˜„๋Œ€์  ๋ฐ์ดํ„ฐ ์•„ํ‚คํ…์ฒ˜์˜ ๋ชจ๋ฒ” ์‚ฌ๋ก€๋ฅผ ์Šต๋“ํ•ฉ๋‹ˆ๋‹ค.

๋ฐฐ์šธ ๋‚ด์šฉ

๊ฐœ๋ฐœ ํ™˜๊ฒฝ: Linux, Git, GitHub, VS Code, ๊ณ ๊ธ‰ Python

CI/CD์™€ ์ฝ”๋“œ ํ’ˆ์งˆ: Ruff, Pylint, Poetry, GitHub Actions

Docker์™€ Docker Compose๋ฅผ ํ™œ์šฉํ•œ ์ปจํ…Œ์ด๋„ˆํ™”

FastAPI๋ฅผ ํ™œ์šฉํ•œ API: ์„ค๊ณ„, ๋ฐฐํฌ, ๋ฌธ์„œํ™”

Data Lake: ์ˆ˜์ง‘, ์ €์žฅ, ์›์‹œ ๋ฐ์ดํ„ฐ ์ •๋ฆฌ

BigQuery๋ฅผ ํ™œ์šฉํ•œ Data Warehouse: ์Šคํ‚ค๋งˆ, ํŒŒํ‹ฐ์…”๋‹, ์ตœ์ ํ™”

PostgreSQL: ์„ค์ •, ๊ด€๋ฆฌ, ๋งค๋‹ˆ์ง€๋“œ ์†”๋ฃจ์…˜๊ณผ์˜ ๋น„๊ต

Fivetran๊ณผ Airbyte๋ฅผ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

dbt๋ฅผ ํ™œ์šฉํ•œ ๋ณ€ํ™˜: models, tests, ๋ฌธ์„œํ™”, ๋ชจ๋“ˆ์„ฑ

Apache Airflow๋ฅผ ํ™œ์šฉํ•œ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜: DAGs, ์Šค์ผ€์ค„๋ง, ๋ชจ๋‹ˆํ„ฐ๋ง

PySpark๋ฅผ ํ™œ์šฉํ•œ ๋น…๋ฐ์ดํ„ฐ: ๋Œ€๊ทœ๋ชจ ๋ณ€ํ™˜

๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆฌ๋ฐ: Google Pub/Sub, Apache Beam, Dataflow

Kubernetes: ์ปจํ…Œ์ด๋„ˆ ๋ฐฐํฌ, ์Šค์ผ€์ผ๋ง, ํ”„๋กœ๋•์…˜ ํด๋Ÿฌ์Šคํ„ฐ

Terraform์„ ํ™œ์šฉํ•œ Infrastructure as Code

๊ณ ๊ธ‰ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค: GraphDB, Document DBs, Wide Column DBs

๋กœ๊น…, ๋ชจ๋‹ˆํ„ฐ๋ง, ํŒŒ์ดํ”„๋ผ์ธ ๊ด€์ธก์„ฑ

๋งˆ์Šคํ„ฐํ•ด์•ผ ํ•  ํ•ต์‹ฌ ์ฃผ์ œ

์ด ๊ธฐ์ˆ ์„ ์ดํ•ดํ•˜๊ณ  ๋ฉด์ ‘์—์„œ ์„ฑ๊ณตํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฐœ๋…

1

Linux๊ณผ Shell: ํ•„์ˆ˜ ๋ช…๋ น์–ด, bash ์Šคํฌ๋ฆฝํŒ…, ๊ถŒํ•œ, cron jobs

2

Git๊ณผ GitHub: branching, merge, rebase, pull requests, CI/CD ์›Œํฌํ”Œ๋กœ

3

๊ณ ๊ธ‰ Python: OOP, ๋ฐ์ฝ”๋ ˆ์ดํ„ฐ, ์ œ๋„ˆ๋ ˆ์ดํ„ฐ, ์ปจํ…์ŠคํŠธ ๋งค๋‹ˆ์ €, typing, async/await

4

CI/CD: linting (Ruff, Pylint), packaging (Poetry), tests, GitHub Actions, pipelines

5

Docker: Dockerfile, ์ด๋ฏธ์ง€, ์ปจํ…Œ์ด๋„ˆ, volumes, networks, multi-stage builds

6

Docker Compose: ๋ฉ€ํ‹ฐ ์ปจํ…Œ์ด๋„ˆ ์„œ๋น„์Šค, ์˜์กด์„ฑ, healthchecks, ๋กœ์ปฌ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜

7

FastAPI: ๋ผ์šฐํŠธ, Pydantic ๋ชจ๋ธ, ์˜์กด์„ฑ, middleware, ๋ฐฐํฌ

8

๊ณ ๊ธ‰ SQL: window functions, CTEs, ๋ถ„์„ ์ฟผ๋ฆฌ, ์ตœ์ ํ™”, ์ธ๋ฑ์‹ฑ

9

BigQuery: ์„œ๋ฒ„๋ฆฌ์Šค ์•„ํ‚คํ…์ฒ˜, ํŒŒํ‹ฐ์…”๋‹, ํด๋Ÿฌ์Šคํ„ฐ๋ง, ๋น„์šฉ, UDFs, federated queries

10

PostgreSQL: ์„ค์ •, ๋ณต์ œ, ์ธ๋ฑ์‹ฑ (B-tree, GIN, GiST), VACUUM, EXPLAIN ANALYZE

11

๋ฐ์ดํ„ฐ ๋ชจ๋ธ๋ง: ์Šคํƒ€ ์Šคํ‚ค๋งˆ, ํŒฉํŠธ/๋””๋ฉ˜์…˜ ํ…Œ์ด๋ธ”, ์ •๊ทœํ™”, SCD, data vault

12

ELT vs ETL vs ETLT: ํŒจํ„ด, ํŠธ๋ ˆ์ด๋“œ์˜คํ”„, ์•„ํ‚คํ…์ฒ˜ ์„ ํƒ

13

Fivetran๊ณผ Airbyte: ์ปค๋„ฅํ„ฐ, ๋™๊ธฐํ™” ๋ชจ๋“œ, CDC, ์Šคํ‚ค๋งˆ ์ง„ํ™”

14

dbt: models, sources, refs, tests, snapshots, incremental models, Jinja macros

15

Apache Airflow: DAGs, operators, sensors, XCom, connections, pools, ํƒœ์Šคํฌ ์˜์กด์„ฑ

16

PySpark: RDD vs DataFrame, ๋ณ€ํ™˜, ์•ก์…˜, ํŒŒํ‹ฐ์…”๋‹, broadcast variables

17

์ŠคํŠธ๋ฆฌ๋ฐ: Pub/Sub (topics, subscriptions), Apache Beam (PCollections, transforms, windowing), Dataflow

18

Kubernetes: pods, deployments, services, ingress, ConfigMaps, Secrets, Helm, scaling

19

Terraform: providers, resources, state, modules, plan/apply, infrastructure as code

20

IAM๊ณผ ๋ณด์•ˆ: ์ตœ์†Œ ๊ถŒํ•œ ์›์น™, service accounts, GCP ์—ญํ• 

21

NoSQL ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค: GraphDB (Neo4j), Document DBs (MongoDB, Firestore), Wide Column (Cassandra, Bigtable)

22

๋ฐ์ดํ„ฐ ์•„ํ‚คํ…์ฒ˜: Data Lake vs Data Warehouse vs Data Lakehouse, Data Mesh, Data Contracts

23

๋ชจ๋‹ˆํ„ฐ๋ง๊ณผ ๊ด€์ธก์„ฑ: ๋กœ๊น…, ๋ฉ”ํŠธ๋ฆญ, ์•Œ๋ฆผ, SLA/SLO/SLI, ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ์ฒดํฌ

์ตœ์‹  Data Engineering ๊ธฐ์‚ฌ

Data Engineering์— ๊ด€ํ•œ ์ตœ์‹  ๊ธฐ์‚ฌ์™€ ๊ฐ€์ด๋“œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”

Apache Airflow 2026๋…„ ๊ฐ€์ด๋“œ: ํŒŒ์ดํ”„๋ผ์ธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜, DAG ๋ฐ ๋ฉด์ ‘ ์งˆ๋ฌธ ์ •๋ฆฌ

Apache Airflow 2026๋…„ ๊ฐ€์ด๋“œ: ํŒŒ์ดํ”„๋ผ์ธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜, DAG ๋ฐ ๋ฉด์ ‘ ์งˆ๋ฌธ ์ •๋ฆฌ

Apache Airflow 3.2์˜ Task SDK๋ฅผ ํ™œ์šฉํ•œ DAG ๊ตฌ์ถ•, ์—์…‹ ํŒŒํ‹ฐ์…˜, ๋„ค์ดํ‹ฐ๋ธŒ ๋น„๋™๊ธฐ ํƒœ์Šคํฌ, ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด ๋ฉด์ ‘์—์„œ ์ž์ฃผ ์ถœ์ œ๋˜๋Š” ์งˆ๋ฌธ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ๋‹ค๋ฃจ๋Š” ํŠœํ† ๋ฆฌ์–ผ์ž…๋‹ˆ๋‹ค.

dbt data transformations and testing tutorial 2026

dbt 2026 ์™„๋ฒฝ ๊ฐ€์ด๋“œ: ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜, ํ…Œ์ŠคํŠธ ์ „๋žต, ๋ฉด์ ‘ ์งˆ๋ฌธ ์ด์ •๋ฆฌ

dbt๋ฅผ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜์˜ ํ•ต์‹ฌ ๊ฐœ๋…๋ถ€ํ„ฐ ์‹ค๋ฌด๊นŒ์ง€, ๋ ˆ์ด์–ด๋“œ ๋ชจ๋ธ๋ง, ์ธํฌ๋ฆฌ๋ฉ˜ํƒˆ ์ „๋žต, ํ…Œ์ŠคํŠธ ๋ฐฉ๋ฒ•๋ก , ๊ทธ๋ฆฌ๊ณ  2026๋…„ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ๋ฉด์ ‘์—์„œ ์ž์ฃผ ์ถœ์ œ๋˜๋Š” ์งˆ๋ฌธ์„ ์ฝ”๋“œ ์˜ˆ์ œ์™€ ํ•จ๊ป˜ ์ƒ์„ธํžˆ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

Apache Spark 4 ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง Structured Streaming ํŒŒ์ดํ”„๋ผ์ธ ๋‹ค์ด์–ด๊ทธ๋žจ

2026๋…„ Apache Spark 4 ์™„๋ฒฝ ๊ฐ€์ด๋“œ: ์‹ ๊ทœ ๊ธฐ๋Šฅ, Structured Streaming, ๋ฉด์ ‘ ์งˆ๋ฌธ

Apache Spark 4์˜ ํ•ต์‹ฌ ์‹ ๊ทœ ๊ธฐ๋Šฅ์ธ ANSI SQL ๋ชจ๋“œ, VARIANT ๋ฐ์ดํ„ฐ ํƒ€์ž…, ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ชจ๋“œ, Spark Connect๋ฅผ ์‹ฌ์ธต ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ๋ฉด์ ‘์„ ์œ„ํ•œ ํ•„์ˆ˜ ์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€๋„ ํ•จ๊ป˜ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋“  Data Engineering ๊ธฐ์‚ฌ ๋ณด๊ธฐ