Data Engineering

Apache Airflow - ๊ณ ๊ธ‰

Sensors, XCom, TaskFlow API, pools, priority, dynamic DAGs, KubernetesPodOperator, monitoring

20 ๋ฉด์ ‘ ์งˆ๋ฌธยท
Senior
1

Apache Airflow์—์„œ Sensor์˜ ์ฃผ์š” ์—ญํ• ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

๋‹ต๋ณ€

Sensor๋Š” DAG ์‹คํ–‰์„ ๊ณ„์†ํ•˜๊ธฐ ์ „์— ์กฐ๊ฑด์ด ์ถฉ์กฑ๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฌ๋Š” ํŠน์ˆ˜ ์˜คํผ๋ ˆ์ดํ„ฐ์ž…๋‹ˆ๋‹ค. ํŒŒ์ผ ๋„์ฐฉ, ํŒŒํ‹ฐ์…˜ ๊ฐ€์šฉ์„ฑ ๋˜๋Š” ๋‹ค๋ฅธ ์ž‘์—…์˜ ์ƒํƒœ์™€ ๊ฐ™์€ ์กฐ๊ฑด์ด ์ถฉ์กฑ๋˜์—ˆ๋Š”์ง€ ์ฃผ๊ธฐ์ ์œผ๋กœ ํ™•์ธ(poke)ํ•ฉ๋‹ˆ๋‹ค. Sensor๋Š” ์™ธ๋ถ€ ์ด๋ฒคํŠธ์— ์˜์กดํ•˜๋Š” ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ํ•˜๋Š” ๋ฐ ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค.

2

Sensor์˜ 'poke' ๋ชจ๋“œ์™€ 'reschedule' ๋ชจ๋“œ์˜ ์ฐจ์ด์ ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

๋‹ต๋ณ€

poke ๋ชจ๋“œ์—์„œ Sensor๋Š” worker slot์„ ์ง€์†์ ์œผ๋กœ ์ ์œ ํ•˜๊ณ  ์ •๊ธฐ์ ์ธ ๊ฐ„๊ฒฉ(poke_interval)์œผ๋กœ ์กฐ๊ฑด์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. reschedule ๋ชจ๋“œ์—์„œ๋Š” Sensor๊ฐ€ ๊ฐ ํ™•์ธ ์‚ฌ์ด์— worker slot์„ ํ•ด์ œํ•˜๊ณ  ์ž์ฒด์ ์œผ๋กœ ์žฌ์˜ˆ์•ฝํ•ฉ๋‹ˆ๋‹ค. reschedule ๋ชจ๋“œ๋Š” ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋Š” ์กฐ๊ฑด์— ๊ถŒ์žฅ๋˜๋ฉฐ ๋‹ค๋ฅธ ์ž‘์—…์„ ์œ„ํ•ด ๋ฆฌ์†Œ์Šค๋ฅผ ํ™•๋ณดํ•ฉ๋‹ˆ๋‹ค.

3

Hive ํŒŒํ‹ฐ์…˜์ด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ด์งˆ ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฌ๊ธฐ ์œ„ํ•ด ์–ด๋–ค Sensor๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ?

๋‹ต๋ณ€

HivePartitionSensor๋Š” Hive ํ…Œ์ด๋ธ”์˜ ํŠน์ • ํŒŒํ‹ฐ์…˜ ์กด์žฌ ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ๋ณ€ํ™˜์„ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— ์†Œ์Šค ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. schema, table, partition๊ณผ ๊ฐ™์€ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค.

4

๋‘ ๊ฐœ์˜ Airflow ์ž‘์—… ๊ฐ„์— ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋‹ฌํ•˜๋Š” ๋ฐฉ๋ฒ•์€?

5

XCom์— ์ €์žฅ๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ถŒ์žฅ ์ตœ๋Œ€ ํฌ๊ธฐ๋Š” ์–ผ๋งˆ์ž…๋‹ˆ๊นŒ?

+17 ๋ฉด์ ‘ ์งˆ๋ฌธ

๊ธฐํƒ€ Data Engineering ๋ฉด์ ‘ ์ฃผ์ œ

Linux & Shell - ๊ธฐ์ดˆ

Junior
20๊ฐœ ์งˆ๋ฌธ

Git & GitHub - ๊ธฐ์ดˆ

Junior
20๊ฐœ ์งˆ๋ฌธ

๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง์„ ์œ„ํ•œ ๊ณ ๊ธ‰ Python

Junior
25๊ฐœ ์งˆ๋ฌธ

Docker - ๊ธฐ์ดˆ

Junior
25๊ฐœ ์งˆ๋ฌธ

Google Cloud Platform - ๊ธฐ์ดˆ

Junior
20๊ฐœ ์งˆ๋ฌธ

CI/CD ๋ฐ ์ฝ”๋“œ ํ’ˆ์งˆ

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Docker Compose

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

FastAPI - ๋ฐ์ดํ„ฐ API

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Data Engineering์„ ์œ„ํ•œ ๊ณ ๊ธ‰ SQL

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Data Lake - ์•„ํ‚คํ…์ฒ˜ ๋ฐ ์ˆ˜์ง‘

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง์„ ์œ„ํ•œ BigQuery

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

PostgreSQL - ๊ด€๋ฆฌ

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Data Engineering์„ ์œ„ํ•œ Data Modeling

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Fivetran & Airbyte - ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

dbt - ๊ธฐ์ดˆ

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Apache Airflow - ๊ธฐ์ดˆ

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Kubernetes - ๊ธฐ์ดˆ

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

dbt - ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ

Senior
20๊ฐœ ์งˆ๋ฌธ

ETL / ELT / ETLT ํŒจํ„ด

Senior
20๊ฐœ ์งˆ๋ฌธ

Airflow + dbt - ํŒŒ์ดํ”„๋ผ์ธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜

Senior
20๊ฐœ ์งˆ๋ฌธ

PySpark - ๋Œ€๊ทœ๋ชจ ์ฒ˜๋ฆฌ

Senior
20๊ฐœ ์งˆ๋ฌธ

Google Pub/Sub - ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆฌ๋ฐ

Senior
20๊ฐœ ์งˆ๋ฌธ

Apache Beam & Dataflow

Senior
20๊ฐœ ์งˆ๋ฌธ

Kubernetes - ํ”„๋กœ๋•์…˜ ๋ฐ ์Šค์ผ€์ผ๋ง

Senior
20๊ฐœ ์งˆ๋ฌธ

Terraform - Infrastructure as Code

Senior
20๊ฐœ ์งˆ๋ฌธ

NoSQL ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค

Senior
20๊ฐœ ์งˆ๋ฌธ

๋ชจ๋˜ Data Architecture

Senior
20๊ฐœ ์งˆ๋ฌธ

๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๊ด€์ฐฐ ๊ฐ€๋Šฅ์„ฑ

Senior
20๊ฐœ ์งˆ๋ฌธ

IAM ๋ฐ ๋ฐ์ดํ„ฐ ๋ณด์•ˆ

Senior
20๊ฐœ ์งˆ๋ฌธ

๋‹ค์Œ ๋ฉด์ ‘์„ ์œ„ํ•ด Data Engineering์„ ๋งˆ์Šคํ„ฐํ•˜์„ธ์š”

๋ชจ๋“  ์งˆ๋ฌธ, flashcards, ๊ธฐ์ˆ  ํ…Œ์ŠคํŠธ, ์ฝ”๋“œ ๋ฆฌ๋ทฐ ์—ฐ์Šต, ๋ฉด์ ‘ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์— ์ ‘๊ทผํ•˜์„ธ์š”.

๋ฌด๋ฃŒ๋กœ ์‹œ์ž‘ํ•˜๊ธฐ