Apache Airflow 2026๋…„ ๊ฐ€์ด๋“œ: ํŒŒ์ดํ”„๋ผ์ธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜, DAG ๋ฐ ๋ฉด์ ‘ ์งˆ๋ฌธ ์ •๋ฆฌ

Apache Airflow 3.2์˜ Task SDK๋ฅผ ํ™œ์šฉํ•œ DAG ๊ตฌ์ถ•, ์—์…‹ ํŒŒํ‹ฐ์…˜, ๋„ค์ดํ‹ฐ๋ธŒ ๋น„๋™๊ธฐ ํƒœ์Šคํฌ, ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด ๋ฉด์ ‘์—์„œ ์ž์ฃผ ์ถœ์ œ๋˜๋Š” ์งˆ๋ฌธ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ๋‹ค๋ฃจ๋Š” ํŠœํ† ๋ฆฌ์–ผ์ž…๋‹ˆ๋‹ค.

Apache Airflow 2026๋…„ ๊ฐ€์ด๋“œ: ํŒŒ์ดํ”„๋ผ์ธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜, DAG ๋ฐ ๋ฉด์ ‘ ์งˆ๋ฌธ ์ •๋ฆฌ

Apache Airflow 3.2๋Š” 3.0 ์•„ํ‚คํ…์ฒ˜ ์ „๋ฉด ๊ฐœํŽธ ์ดํ›„ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ง„ํ™”๋ฅผ ์ด๋ฃฌ ๋ฆด๋ฆฌ์Šค์ž…๋‹ˆ๋‹ค. PyPI ์›”๊ฐ„ ๋‹ค์šด๋กœ๋“œ ์ˆ˜๊ฐ€ 1,000๋งŒ ๊ฑด์„ ๋„˜์–ด์„ฐ์œผ๋ฉฐ, Airbnb๋ถ€ํ„ฐ Spotify๊นŒ์ง€ ๋‹ค์–‘ํ•œ ๊ธฐ์—…์—์„œ ์ฑ„ํƒํ•˜๊ณ  ์žˆ์–ด ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•, ์Šค์ผ€์ค„๋ง, ๋ชจ๋‹ˆํ„ฐ๋ง ๋ถ„์•ผ์—์„œ ํ•ต์‹ฌ ๋„๊ตฌ๋กœ์„œ์˜ ์œ„์ƒ์„ ํ™•๊ณ ํžˆ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ์ƒˆ๋กœ์šด Task SDK๋ฅผ ํ™œ์šฉํ•œ DAG ์ž‘์„ฑ, ํŒŒ์ดํ”„๋ผ์ธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ ํŒจํ„ด, ๊ทธ๋ฆฌ๊ณ  ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด ์ง๋ฌด ๋ฉด์ ‘์—์„œ ์ž์ฃผ ์ถœ์ œ๋˜๋Š” ์งˆ๋ฌธ์„ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค.

Airflow 3.2 ํ•ต์‹ฌ ๋ณ€๊ฒฝ์‚ฌํ•ญ

Airflow 3.2๋Š” 2026๋…„ 4์›”์— ๋ฆด๋ฆฌ์Šค๋˜์—ˆ์œผ๋ฉฐ, ์„ธ๋ถ„ํ™”๋œ ๋ฐ์ดํ„ฐ ์ธ์‹ ์Šค์ผ€์ค„๋ง์„ ์œ„ํ•œ ์—์…‹ ํŒŒํ‹ฐ์…˜, PythonOperator์˜ ๋„ค์ดํ‹ฐ๋ธŒ ๋น„๋™๊ธฐ ์ง€์›, ๋ฉ€ํ‹ฐํŒ€ ๋ฐฐํฌ ๊ธฐ๋Šฅ์ด ๋„์ž…๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋“  DAG ์ž„ํฌํŠธ๋Š” Airflow 3.0์—์„œ ๋„์ž…๋œ ์•ˆ์ •์ ์ธ airflow.sdk ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

Airflow Task SDK๋ฅผ ํ™œ์šฉํ•œ DAG ์ž‘์„ฑ

Airflow 3.0์—์„œ ๋„์ž…๋œ Task SDK๋Š” DAG ์ •์˜๋ฅผ Airflow ๋‚ด๋ถ€ ๊ตฌํ˜„์œผ๋กœ๋ถ€ํ„ฐ ๋ถ„๋ฆฌํ•˜๋Š” ๋…๋ฆฝ ํŒจํ‚ค์ง€์ž…๋‹ˆ๋‹ค. ํ•ต์‹ฌ ๋ชฉํ‘œ๋Š” Airflow ์—…๊ทธ๋ ˆ์ด๋“œ ์‹œ์—๋„ ์ฝ”๋“œ ๋ณ€๊ฒฝ ์—†์ด ์ž‘๋™ํ•˜๋Š” ์ด์‹ ๊ฐ€๋Šฅํ•˜๊ณ  ์•ˆ์ •์ ์ธ DAG๋ฅผ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. DAG, dag, task, BaseOperator, Connection, Variable ๋“ฑ ๋ชจ๋“  ํ•ต์‹ฌ ๊ฐ์ฒด๊ฐ€ airflow.sdk ํ•˜์œ„์— ๋ฐฐ์น˜๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋ ˆ๊ฑฐ์‹œ ์ž„ํฌํŠธ ๊ฒฝ๋กœ(airflow.decorators.task, airflow.models.dag.DAG)๋Š” 3.2์—์„œ๋„ ๋™์ž‘ํ•˜์ง€๋งŒ ์ง€์› ์ค‘๋‹จ์œผ๋กœ ํ‘œ์‹œ๋˜์–ด ์žˆ์œผ๋ฉฐ, ํ–ฅํ›„ ๋ฆด๋ฆฌ์Šค์—์„œ ์ œ๊ฑฐ๋  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

python
# etl_daily_revenue.py
import pendulum
from airflow.sdk import dag, task

@dag(
    schedule="@daily",
    start_date=pendulum.datetime(2026, 1, 1, tz="UTC"),
    catchup=False,
    tags=["finance", "etl"],
)
def etl_daily_revenue():
    @task()
    def extract_transactions() -> list[dict]:
        # Pulls raw transactions from the payments API
        import requests
        response = requests.get("https://api.internal/payments/daily")
        return response.json()["transactions"]

    @task(multiple_outputs=True)
    def transform(transactions: list[dict]) -> dict:
        # Aggregates by currency and computes totals
        totals = {}
        for tx in transactions:
            currency = tx["currency"]
            totals[currency] = totals.get(currency, 0) + tx["amount"]
        return {"totals": totals, "count": len(transactions)}

    @task()
    def load(totals: dict, count: int):
        # Inserts aggregated row into the warehouse
        from warehouse import insert_revenue_summary
        insert_revenue_summary(totals=totals, transaction_count=count)

    raw = extract_transactions()
    result = transform(raw)
    load(result["totals"], result["count"])

etl_daily_revenue()

@dag ๋ฐ์ฝ”๋ ˆ์ดํ„ฐ๋Š” ๊ธฐ์กด์˜ with DAG(...) ์ปจํ…์ŠคํŠธ ๋งค๋‹ˆ์ €๋ฅผ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค. @task ๋ฐ์ฝ”๋ ˆ์ดํ„ฐ๋Š” ์ผ๋ฐ˜ Python ํ•จ์ˆ˜๋ฅผ Airflow ํƒœ์Šคํฌ๋กœ ๋ณ€ํ™˜ํ•˜๋ฉฐ, XCom ์ง๋ ฌํ™”๋Š” ์ž๋™์œผ๋กœ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค. transform(raw)๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด ์˜์กด ๊ด€๊ณ„๊ฐ€ ์„ ์–ธ๋˜๊ณ , Airflow๋Š” ํ•จ์ˆ˜ ํ˜ธ์ถœ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ DAG ์—ฃ์ง€๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๊ฐ€๋ณ€ ์›Œํฌ๋กœ๋“œ๋ฅผ ์œ„ํ•œ ๋™์  ํƒœ์Šคํฌ ๋งคํ•‘

์ฒ˜๋ฆฌํ•  ํ•ญ๋ชฉ ์ˆ˜๊ฐ€ ์‹คํ–‰๋งˆ๋‹ค ๋‹ฌ๋ผ์ง€๋Š” ๊ฒฝ์šฐ, ์ •์  DAG๋กœ๋Š” ๋Œ€์‘ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. Airflow 2.4์—์„œ ์•ˆ์ •ํ™”๋˜๊ณ  3.x์—์„œ ๋”์šฑ ๊ฐœ์„ ๋œ ๋™์  ํƒœ์Šคํฌ ๋งคํ•‘์€ .expand()๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋Ÿฐํƒ€์ž„์— ํƒœ์Šคํฌ๋ฅผ ํ™•์žฅํ•จ์œผ๋กœ์จ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

python
# etl_multi_region.py
import pendulum
from airflow.sdk import dag, task

@dag(
    schedule="@hourly",
    start_date=pendulum.datetime(2026, 1, 1, tz="UTC"),
    catchup=False,
    tags=["regional", "etl"],
)
def etl_multi_region():
    @task()
    def get_regions() -> list[str]:
        # Returns the active regions from config
        return ["us-east-1", "eu-west-1", "ap-southeast-1"]

    @task()
    def process_region(region: str) -> dict:
        # Processes events for a single region
        from data_processor import aggregate_events
        return aggregate_events(region)

    @task()
    def merge_results(results: list[dict]):
        # Combines all regional outputs into a single report
        from reporter import publish_global_report
        publish_global_report(results)

    regions = get_regions()
    # .expand() creates one mapped task instance per region
    processed = process_region.expand(region=regions)
    merge_results(processed)

etl_multi_region()

์‹คํ–‰ ์‹œ์ ์— Airflow๋Š” ๋ฆฌ์ „๋ณ„๋กœ process_region์˜ ๋ณ‘๋ ฌ ์ธ์Šคํ„ด์Šค๋ฅผ 3๊ฐœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ๋‚  ๋„ค ๋ฒˆ์งธ ๋ฆฌ์ „์ด ์ถ”๊ฐ€๋˜๋”๋ผ๋„ DAG ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. merge_results ํƒœ์Šคํฌ๋Š” ๋ชจ๋“  ๋งคํ•‘๋œ ์ธ์Šคํ„ด์Šค๊ฐ€ ์™„๋ฃŒ๋  ๋•Œ๊นŒ์ง€ ๋Œ€๊ธฐํ•ฉ๋‹ˆ๋‹ค.

๋งคํ•‘๋œ ํƒœ์Šคํฌ ์ œํ•œ

๊ธฐ๋ณธ์ ์œผ๋กœ Airflow๋Š” ๋งคํ•‘๋‹น 1024๊ฐœ์˜ ์ธ์Šคํ„ด์Šค๋กœ ์ œํ•œํ•ฉ๋‹ˆ๋‹ค. ๋” ํฐ ๋ฐ์ดํ„ฐ์…‹์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒฝ์šฐ DAG ์„ค์ •์—์„œ max_map_length๋ฅผ ์˜ค๋ฒ„๋ผ์ด๋“œํ•˜์—ฌ ์ด ๊ฐ’์„ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์—์…‹ ํŒŒํ‹ฐ์…˜: Airflow 3.2์˜ ๋ฐ์ดํ„ฐ ์ธ์‹ ์Šค์ผ€์ค„๋ง

Airflow 3.2 ์ด์ „์—๋Š” ๋ฐ์ดํ„ฐ ์ธ์‹ ์Šค์ผ€์ค„๋ง์ด ์—์…‹ ์ˆ˜์ค€์—์„œ ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ”„๋กœ๋“€์„œ DAG๊ฐ€ ์—์…‹์„ ์—…๋ฐ์ดํŠธํ•˜๋ฉด, ์–ด๋–ค ๋ฐ์ดํ„ฐ ์Šฌ๋ผ์ด์Šค๊ฐ€ ๋ณ€๊ฒฝ๋˜์—ˆ๋Š”์ง€์™€ ๊ด€๊ณ„์—†์ด ๋ชจ๋“  ์ปจ์Šˆ๋จธ DAG๊ฐ€ ํŠธ๋ฆฌ๊ฑฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์—์…‹ ํŒŒํ‹ฐ์…˜์€ ํŒŒํ‹ฐ์…˜ ์ˆ˜์ค€์˜ ์„ธ๋ถ„์„ฑ์„ ์ง€์›ํ•˜์—ฌ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

์„ธ ๊ฐœ์˜ ์ƒ์œ„ DAG๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ์Šคํฌ์ธ  ๋ฆฌ๊ทธ์˜ ์‹œ๊ฐ„๋ณ„ ์„ ์ˆ˜ ํ†ต๊ณ„๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ƒ๊ฐํ•ด ๋ด…์‹œ๋‹ค. ํ•˜์œ„ ๋ถ„์„ DAG๋Š” ์„ธ ๋ฆฌ๊ทธ ๋ชจ๋‘ ๋™์ผํ•œ ์‹œ๊ฐ„๋Œ€์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฒŒ์‹œํ•œ ๊ฒฝ์šฐ์—๋งŒ ํŠธ๋ฆฌ๊ฑฐ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

python
# downstream_analytics.py
import pendulum
from airflow.sdk import dag, task
from airflow.timetables.assets import CronPartitionTimetable

# Define partitioned assets with hourly granularity
nba_stats = Asset("s3://datalake/nba/hourly/", partitions=CronPartitionTimetable("0 * * * *"))
epl_stats = Asset("s3://datalake/epl/hourly/", partitions=CronPartitionTimetable("0 * * * *"))
nfl_stats = Asset("s3://datalake/nfl/hourly/", partitions=CronPartitionTimetable("0 * * * *"))

@dag(
    # Triggers only when all three assets have matching partition
    schedule=(nba_stats & epl_stats & nfl_stats),
    start_date=pendulum.datetime(2026, 1, 1, tz="UTC"),
    catchup=False,
)
def unified_sports_analytics():
    @task()
    def aggregate_all_leagues(**context):
        partition = context["partition"]
        # Process only the specific hourly slice across all leagues
        from analytics import build_cross_league_report
        build_cross_league_report(partition=partition)

    aggregate_all_leagues()

unified_sports_analytics()

ํŒŒํ‹ฐ์…˜ ๊ธฐ๋ฐ˜ ์Šค์ผ€์ค„๋ง์€ ๋ถˆํ•„์š”ํ•œ ํŒŒ์ดํ”„๋ผ์ธ ์‹คํ–‰์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค. ํ•˜์œ„ DAG๋Š” ํ•„์š”ํ•œ ํŒŒํ‹ฐ์…˜์ด ๋ชจ๋“  ์ƒ์œ„ ์†Œ์Šค์—์„œ ์ค€๋น„๋œ ๊ฒฝ์šฐ์—๋งŒ ์‹คํ–‰๋˜๋ฉฐ, ๋ถ€๋ถ„์ ์ธ ๋ฐ์ดํ„ฐ๋กœ๋Š” ์‹คํ–‰๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

Data Engineering ๋ฉด์ ‘ ์ค€๋น„๊ฐ€ ๋˜์…จ๋‚˜์š”?

์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ, flashcards, ๊ธฐ์ˆ  ํ…Œ์ŠคํŠธ๋กœ ์—ฐ์Šตํ•˜์„ธ์š”.

I/O ์ง‘์•ฝ์  ์›Œํฌ๋กœ๋“œ๋ฅผ ์œ„ํ•œ ๋„ค์ดํ‹ฐ๋ธŒ ๋น„๋™๊ธฐ ํƒœ์Šคํฌ

Airflow 3.2์—์„œ๋Š” PythonOperator์— ๋„ค์ดํ‹ฐ๋ธŒ ๋น„๋™๊ธฐ ์ง€์›์ด ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์ „์—๋Š” ๋Œ€๋Ÿ‰์˜ API ํ˜ธ์ถœ์ด๋‚˜ ๋ฐฐ์น˜ ํŒŒ์ผ ๋‹ค์šด๋กœ๋“œ ๊ฐ™์€ ๋ณ‘๋ ฌ I/O ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด ์ปค์Šคํ…€ deferrable ์˜คํผ๋ ˆ์ดํ„ฐ๋ฅผ ์ž‘์„ฑํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ๋Š” async ํ•จ์ˆ˜๋ฅผ ์ง์ ‘ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

python
# async_file_download.py
import asyncio
import aiohttp
from airflow.sdk import dag, task
import pendulum

@dag(
    schedule="@daily",
    start_date=pendulum.datetime(2026, 1, 1, tz="UTC"),
    catchup=False,
    tags=["async", "download"],
)
def async_file_download():
    @task()
    async def download_files():
        urls = [f"https://data.source/files/{i}.csv" for i in range(500)]
        async with aiohttp.ClientSession() as session:
            # Downloads 500 files concurrently instead of sequentially
            tasks = [fetch_and_save(session, url) for url in urls]
            results = await asyncio.gather(*tasks)
        return {"downloaded": len(results)}

    download_files()

async def fetch_and_save(session: aiohttp.ClientSession, url: str):
    async with session.get(url) as resp:
        content = await resp.read()
        filename = url.split("/")[-1]
        with open(f"/data/downloads/{filename}", "wb") as f:
            f.write(content)
        return filename

async_file_download()

๋น„๋™๊ธฐ ๋ฐฉ์‹์€ ๋‹จ์ผ ์›Œ์ปค ์Šฌ๋กฏ์—์„œ 500๊ฐœ์˜ ํŒŒ์ผ์„ ๋™์‹œ์— ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค. ๋™๊ธฐ ๋ฒ„์ „์˜ 500ํšŒ ์ˆœ์ฐจ HTTP ์š”์ฒญ๊ณผ ๋น„๊ตํ•˜๋ฉด, I/O ๋ฐ”์šด๋“œ ์›Œํฌ๋กœ๋“œ์—์„œ ์ˆ˜์‹ญ ๋ฐฐ์˜ ์†๋„ ํ–ฅ์ƒ์„ ์‹คํ˜„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ถ”๊ฐ€ ์›Œ์ปค๋ฅผ ํ”„๋กœ๋น„์ €๋‹ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

Airflow ์•„ํ‚คํ…์ฒ˜: ๊ตฌ์„ฑ ์š”์†Œ์™€ ์‹คํ–‰ ํ๋ฆ„

Airflow์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ํ”„๋กœ๋•์…˜ ์šด์˜๊ณผ ๋ฉด์ ‘ ๋Œ€๋น„ ๋ชจ๋‘์— ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. ๋ชจ๋“  DAG ์‹คํ–‰์—์„œ ๋‹ค์„ฏ ๊ฐ€์ง€ ๊ตฌ์„ฑ ์š”์†Œ๊ฐ€ ์ƒํ˜ธ์ž‘์šฉํ•ฉ๋‹ˆ๋‹ค.

  • ์Šค์ผ€์ค„๋Ÿฌ โ€” DAG ํŒŒ์ผ์„ ํŒŒ์‹ฑํ•˜๊ณ , ์˜์กด ๊ด€๊ณ„๋ฅผ ํ•ด์„ํ•˜๋ฉฐ, ์‹คํ–‰์„ ์œ„ํ•ด ํƒœ์Šคํฌ๋ฅผ ํ์— ๋„ฃ์Šต๋‹ˆ๋‹ค. Airflow 3.x์—์„œ ์Šค์ผ€์ค„๋Ÿฌ๋Š” ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ๋Œ€ํ•ด ๋ฌด์ƒํƒœ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
  • ์ต์Šคํํ„ฐ โ€” ํƒœ์Šคํฌ๊ฐ€ ์‹คํ–‰๋˜๋Š” ์œ„์น˜๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. LocalExecutor๋Š” ๋‹จ์ผ ๋จธ์‹  ํ™˜๊ฒฝ์„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. CeleryExecutor๋Š” ์›Œ์ปค ๋…ธ๋“œ ๊ฐ„์— ๋ถ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. KubernetesExecutor๋Š” ํƒœ์Šคํฌ๋งˆ๋‹ค ํŒŒ๋“œ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ์›Œ์ปค โ€” ์‹ค์ œ ํƒœ์Šคํฌ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. Airflow 3.0์˜ Task Execution API๋ฅผ ํ†ตํ•ด ์›Œ์ปค๋Š” ์•ˆ์ •๋œ ๊ณ„์•ฝ ๊ธฐ๋ฐ˜์œผ๋กœ ํ†ต์‹ ํ•˜๋ฉฐ, ์ปจํ…Œ์ด๋„ˆ, ์—ฃ์ง€ ํ™˜๊ฒฝ, ์™ธ๋ถ€ ๋Ÿฐํƒ€์ž„์—์„œ์˜ ์‹คํ–‰์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค โ€” PostgreSQL(๊ถŒ์žฅ) ๋˜๋Š” MySQL์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. DAG ์ •์˜, ํƒœ์Šคํฌ ์ƒํƒœ, XCom ๊ฐ’, ์—ฐ๊ฒฐ ์ •๋ณด, ๊ฐ์‚ฌ ๋กœ๊ทธ๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
  • ์›น ์„œ๋ฒ„ โ€” DAG ์‹คํ–‰ ๋ชจ๋‹ˆํ„ฐ๋ง, ๋กœ๊ทธ ํ™•์ธ, ์ˆ˜๋™ ์‹คํ–‰ ํŠธ๋ฆฌ๊ฑฐ, ์—ฐ๊ฒฐ ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•œ Airflow UI์ž…๋‹ˆ๋‹ค.
ํ”„๋กœ๋•์…˜ ์ต์Šคํํ„ฐ ์„ ํƒ

ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ๋Š” SequentialExecutor ์‚ฌ์šฉ์„ ํ”ผํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์˜ ํƒœ์Šคํฌ๋งŒ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๊ฐœ๋ฐœ ์šฉ๋„๋กœ๋งŒ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. Kubernetes ๋„ค์ดํ‹ฐ๋ธŒ ํ™˜๊ฒฝ์—์„œ๋Š” KubernetesExecutor๊ฐ€ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ๊ฒฉ๋ฆฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ํƒœ์Šคํฌ๊ฐ€ ๋…๋ฆฝ์ ์ธ ๋ฆฌ์†Œ์Šค์™€ ์˜์กด์„ฑ์„ ๊ฐ€์ง„ ๊ฐœ๋ณ„ ํŒŒ๋“œ์—์„œ ์‹คํ–‰๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

2026๋…„ Airflow vs. Prefect vs. Dagster ๋น„๊ต

Airflow์—๋Š” ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๊ฒฝ์Ÿ ๋„๊ตฌ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ฌ๋ฐ”๋ฅธ ์„ ํƒ์€ ํŒ€ ๊ทœ๋ชจ, ๊ธฐ์กด ์ธํ”„๋ผ, ๊ตฌ์ถ•ํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ์˜ ์œ ํ˜•์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.

| ๊ธฐ๋Šฅ | Airflow 3.2 | Prefect 3.x | Dagster 1.9 | |---|---|---|---| | DAG ์ •์˜ | Python ๋ฐ์ฝ”๋ ˆ์ดํ„ฐ(airflow.sdk) | Python ๋ฐ์ฝ”๋ ˆ์ดํ„ฐ(@flow, @task) | Python ๋ฐ์ฝ”๋ ˆ์ดํ„ฐ(@asset, @op) | | ์Šค์ผ€์ค„๋ง | Cron, ์—์…‹ ์ธ์‹, ํŒŒํ‹ฐ์…˜ ์ธ์‹ | Cron, ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ | Cron, ์„ผ์„œ ๊ธฐ๋ฐ˜, ์—์…‹ ์ธ์‹ | | ์‹คํ–‰ ๋ชจ๋ธ | ์ค‘์•™ ์Šค์ผ€์ค„๋Ÿฌ + ๋ถ„์‚ฐ ์›Œ์ปค | ํ•˜์ด๋ธŒ๋ฆฌ๋“œ(์„œ๋ฒ„ + ์›Œํฌ ํ’€) | ์ค‘์•™ dagster-daemon | | ๋™์  ํƒœ์Šคํฌ | .expand() ๋งคํ•‘๋œ ํƒœ์Šคํฌ | ๋„ค์ดํ‹ฐ๋ธŒ Python ๋ฃจํ”„ | ๋™์  ํŒŒํ‹ฐ์…˜ | | ๋น„๋™๊ธฐ ์ง€์› | 3.2์—์„œ ๋„ค์ดํ‹ฐ๋ธŒ ์ง€์› | 2.0๋ถ€ํ„ฐ ๋„ค์ดํ‹ฐ๋ธŒ ์ง€์› | ๋น„๋™๊ธฐ I/O ์˜คํผ๋ ˆ์ด์…˜ | | ๋ฉ€ํ‹ฐํŒ€ ๊ฒฉ๋ฆฌ | ๋‚ด์žฅ(3.2 ์‹คํ—˜์  ๊ธฐ๋Šฅ) | ์›Œํฌ์ŠคํŽ˜์ด์Šค ๊ธฐ๋ฐ˜(Cloud) | ๋ธŒ๋žœ์น˜ ๋ฐฐํฌ | | ์ปค๋ฎค๋‹ˆํ‹ฐ ๊ทœ๋ชจ | ์ตœ๋Œ€(GitHub 35,000+ ์Šคํƒ€) | ์„ฑ์žฅ ์ค‘(18,000+ ์Šคํƒ€) | ์„ฑ์žฅ ์ค‘(12,000+ ์Šคํƒ€) | | ์ ํ•ฉํ•œ ํ™˜๊ฒฝ | ๋Œ€๊ทœ๋ชจ ๋ณต์žกํ•œ ๋ฉ€ํ‹ฐํŒ€ ํŒŒ์ดํ”„๋ผ์ธ | ๋น ๋ฅธ ๊ฐœ๋ฐœ ์‚ฌ์ดํด์„ ์›ํ•˜๋Š” ์†Œ๊ทœ๋ชจ ํŒ€ | ๋ฐ์ดํ„ฐ ์—์…‹ ์ค‘์‹ฌ ์กฐ์ง |

Airflow์˜ ๊ฐ•์ ์€ ์—์ฝ”์‹œ์Šคํ…œ์— ์žˆ์Šต๋‹ˆ๋‹ค. ์ฃผ์š” ํด๋ผ์šฐ๋“œ ์„œ๋น„์Šค, ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค, API๋ฅผ ํฌ๊ด„ํ•˜๋Š” 80๊ฐœ ์ด์ƒ์˜ ํ”„๋กœ๋ฐ”์ด๋” ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Prefect๋Š” ๋ณด์ผ๋Ÿฌํ”Œ๋ ˆ์ดํŠธ๊ฐ€ ์ ์€ ์šฐ์ˆ˜ํ•œ ๊ฐœ๋ฐœ์ž ๊ฒฝํ—˜์— ๊ฐ•์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. Dagster์˜ ์—์…‹ ์ค‘์‹ฌ ๋ชจ๋ธ์€ ํƒœ์Šคํฌ ์‹œํ€€์Šค๊ฐ€ ์•„๋‹Œ ๋ฐ์ดํ„ฐ ํ”„๋กœ๋•ํŠธ ๊ด€์ ์—์„œ ์‚ฌ๊ณ ํ•˜๋Š” ํŒ€์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด ๋ฉด์ ‘์—์„œ ์ถœ์ œ๋˜๋Š” Apache Airflow ์งˆ๋ฌธ

๋‹ค์Œ ์งˆ๋ฌธ๋“ค์€ Airflow๋ฅผ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ ์šด์˜ํ•˜๋Š” ๊ธฐ์—…์˜ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด ๋ฉด์ ‘์—์„œ ์‹ค์ œ๋กœ ์ถœ์ œ๋˜๋Š” ๋‚ด์šฉ์„ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค.

DAG๋ž€ ๋ฌด์—‡์ด๋ฉฐ, Airflow์—์„œ ์–ด๋–ป๊ฒŒ ํ™œ์šฉ๋˜๋Š”๊ฐ€

DAG(Directed Acyclic Graph, ๋ฐฉํ–ฅ ๋น„์ˆœํ™˜ ๊ทธ๋ž˜ํ”„)๋Š” ์ˆœํ™˜ ์ฐธ์กฐ๊ฐ€ ์—†๋Š” ๊ฒƒ์ด ๋ณด์žฅ๋œ ํƒœ์Šคํฌ์™€ ์˜์กด ๊ด€๊ณ„์˜ ์ง‘ํ•ฉ์œผ๋กœ ์›Œํฌํ”Œ๋กœ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. Airflow๋Š” dags/ ํด๋”์˜ Python ํŒŒ์ผ์„ ํŒŒ์‹ฑํ•˜์—ฌ ์˜์กด ๊ด€๊ณ„ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ , ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ์‹คํ–‰ ์ˆœ์„œ๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ DAG ์‹คํ–‰์€ ๋…ผ๋ฆฌ์  ๋‚ ์งœ์— ์—ฐ๊ฒฐ๋œ DagRun ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. "๋น„์ˆœํ™˜" ์ œ์•ฝ ์กฐ๊ฑด์œผ๋กœ ์ธํ•ด ์Šค์ผ€์ค„๋Ÿฌ๋Š” ํ•ญ์ƒ ์œ ํšจํ•œ ์‹คํ–‰ ์ˆœ์„œ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

XCom์˜ ์ž‘๋™ ๋ฐฉ์‹๊ณผ ์‚ฌ์šฉ์„ ํ”ผํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ๋ฌด์—‡์ธ๊ฐ€

XCom(Cross-Communication)์€ ํƒœ์Šคํฌ ๊ฐ„์— ์†Œ๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋‹ฌํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์ž…๋‹ˆ๋‹ค. ํƒœ์Šคํฌ๊ฐ€ ๋ฐ˜ํ™˜๊ฐ’์„ XCom์— ํ‘ธ์‹œํ•˜๊ณ , ํ•˜์œ„ ํƒœ์Šคํฌ๊ฐ€ ์ด๋ฅผ ํ’€ํ•ฉ๋‹ˆ๋‹ค. Task SDK์—์„œ๋Š” ๋ฐ์ฝ”๋ ˆ์ดํŒ…๋œ ํƒœ์Šคํฌ ๊ฐ„ ํ•จ์ˆ˜ ๋ฐ˜ํ™˜๊ฐ’ ์ „๋‹ฌ ์‹œ ์ž๋™์œผ๋กœ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค. XCom์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋ฏ€๋กœ, ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ(์ˆ˜ KB ์ด์ƒ)์˜ ์ „์†ก์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ „์†กํ•  ๋•Œ๋Š” ์™ธ๋ถ€ ์Šคํ† ๋ฆฌ์ง€(S3, GCS)๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , XCom์—๋Š” ์ฐธ์กฐ ๊ฒฝ๋กœ๋งŒ ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

schedule, start_date, catchup์˜ ์ฐจ์ด์ ์„ ์„ค๋ช…ํ•˜๋ผ

schedule ํŒŒ๋ผ๋ฏธํ„ฐ(Airflow 3.x์—์„œ schedule_interval์—์„œ ์ด๋ฆ„ ๋ณ€๊ฒฝ)๋Š” DAG์˜ ์‹คํ–‰ ๋นˆ๋„๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. cron ๋ฌธ์ž์—ด, timedelta, ํƒ€์ž„ํ…Œ์ด๋ธ” ๊ฐ์ฒด, ๋˜๋Š” ์—์…‹ ํŠธ๋ฆฌ๊ฑฐ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. start_date๋Š” DAG ์‹คํ–‰์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์žฅ ์ด๋ฅธ ๋…ผ๋ฆฌ์  ๋‚ ์งœ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. catchup=True(๊ธฐ๋ณธ๊ฐ’)๋Š” start_date๋ถ€ํ„ฐ ํ˜„์žฌ๊นŒ์ง€ ๋ˆ„๋ฝ๋œ ๋ชจ๋“  ๊ฐ„๊ฒฉ์— ๋Œ€ํ•ด DAG ์‹คํ–‰์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. catchup=False๋ฅผ ์„ค์ •ํ•˜๋ฉด ๊ณผ๊ฑฐ ๊ฐ„๊ฒฉ์„ ๊ฑด๋„ˆ๋›ฐ๊ณ  ํ˜„์žฌ ์‹œ์ ๋ถ€ํ„ฐ๋งŒ ์Šค์ผ€์ค„๋งํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กœ๋•์…˜์˜ ์ผ๋ฐ˜์ ์ธ ํŒจํ„ด์€ ์šด์˜ DAG์— catchup=False๋ฅผ, ์ด๋ ฅ ๋ฐฑํ•„์— catchup=True๋ฅผ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

KubernetesExecutor์™€ CeleryExecutor์˜ ์ฐจ์ด์ ์€ ๋ฌด์—‡์ธ๊ฐ€

CeleryExecutor๋Š” ๋ฉ”์‹œ์ง€ ๋ธŒ๋กœ์ปค(Redis ๋˜๋Š” RabbitMQ)๋ฅผ ํ†ตํ•ด ์—ฐ๊ฒฐ๋œ ์žฅ์‹œ๊ฐ„ ์‹คํ–‰ ์›Œ์ปค ํ”„๋กœ์„ธ์Šค ํ’€์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ํƒœ์Šคํฌ๊ฐ€ ํ์— ๋“ค์–ด๊ฐ€๊ณ  ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์›Œ์ปค์—์„œ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. KubernetesExecutor๋Š” ํƒœ์Šคํฌ๋งˆ๋‹ค Docker ์ด๋ฏธ์ง€์™€ ๋ฆฌ์†Œ์Šค ์š”๊ตฌ์‚ฌํ•ญ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒˆ๋กœ์šด Kubernetes ํŒŒ๋“œ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. Celery๋Š” ๋‚ฎ์€ ์ง€์—ฐ ์‹œ๊ฐ„(ํŒŒ๋“œ ์‹œ์ž‘ ์˜ค๋ฒ„ํ—ค๋“œ ์—†์Œ)์„ ์ œ๊ณตํ•˜๋ฉฐ ๋™์งˆ์  ์›Œํฌ๋กœ๋“œ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. KubernetesExecutor๋Š” ๋” ๊ฐ•๋ ฅํ•œ ๊ฒฉ๋ฆฌ์™€ ํƒœ์Šคํฌ๋ณ„ ๋ฆฌ์†Œ์Šค ์ œ์–ด๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ์ •์  ์›Œ์ปค ํ’€ ๊ด€๋ฆฌ๊ฐ€ ๋ถˆํ•„์š”ํ•˜์—ฌ ์ด์งˆ์  ์›Œํฌ๋กœ๋“œ์— ์ด์ƒ์ ์ž…๋‹ˆ๋‹ค.

ํƒœ์Šคํฌ ์‹คํŒจ ์ฒ˜๋ฆฌ์—๋Š” ์–ด๋–ค ์ „๋žต์ด ์žˆ๋Š”๊ฐ€

Airflow๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์‹คํŒจ ์ฒ˜๋ฆฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. retries์™€ retry_delay๋กœ ์ง€์ˆ˜ ๋ฐฑ์˜คํ”„๋ฅผ ์ ์šฉํ•œ ์ž๋™ ์žฌ์‹œ๋„๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. on_failure_callback์€ ํƒœ์Šคํฌ ์‹คํŒจ ์‹œ ์ปค์Šคํ…€ ๋กœ์ง(Slack ์•Œ๋ฆผ, PagerDuty ์ธ์‹œ๋˜ํŠธ)์„ ํŠธ๋ฆฌ๊ฑฐํ•ฉ๋‹ˆ๋‹ค. trigger_rule๋กœ ํ•˜์œ„ ํƒœ์Šคํฌ์˜ ์‘๋‹ต ๋ฐฉ์‹์„ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค. all_success(๊ธฐ๋ณธ๊ฐ’), one_success, all_failed, none_failed_min_one_success๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ผ์‹œ์ ์ธ ์ธํ”„๋ผ ์žฅ์• ์— ๋Œ€ํ•ด์„œ๋Š” retry_exponential_backoff=True ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์žฌ์‹œ๋„ ๊ฐ„ ๋Œ€๊ธฐ ์‹œ๊ฐ„์„ ์ ์ง„์ ์œผ๋กœ ๋Š˜๋ฆฝ๋‹ˆ๋‹ค. SLA(3.x์—์„œ๋Š” Deadline Alerts)๋Š” ์‹คํ–‰ ์‹œ๊ฐ„์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ , ํƒœ์Šคํฌ๊ฐ€ ์˜ˆ์ƒ ์‹คํ–‰ ์‹œ๊ฐ„์„ ์ดˆ๊ณผํ•˜๋ฉด ์ฝœ๋ฐฑ์„ ๋ฐœ๋™ํ•ฉ๋‹ˆ๋‹ค.

Airflow ํ”„๋กœ๋•์…˜ ๋ฐฐํฌ ๋ชจ๋ฒ” ์‚ฌ๋ก€

Airflow๋ฅผ ๋Œ€๊ทœ๋ชจ๋กœ ์•ˆ์ •์ ์œผ๋กœ ์šด์˜ํ•˜๋ ค๋ฉด ์˜ฌ๋ฐ”๋ฅธ DAG ์ž‘์„ฑ์„ ๋„˜์–ด์„œ ์—ฌ๋Ÿฌ ์šด์˜ ํŒจํ„ด์— ์ฃผ์˜๋ฅผ ๊ธฐ์šธ์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋ฉฑ๋“ฑ์„ฑ์„ ๊ฐ–์ถ˜ ํƒœ์Šคํฌ. ๋ชจ๋“  ํƒœ์Šคํฌ๋Š” ๋™์ผํ•œ ์ž…๋ ฅ์œผ๋กœ ์—ฌ๋Ÿฌ ๋ฒˆ ์‹คํ–‰ํ•ด๋„ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‹จ์ˆœ INSERT ๋Œ€์‹  INSERT ... ON CONFLICT๋‚˜ MERGE๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋…ผ๋ฆฌ์  ๋‚ ์งœ ๊ธฐ์ค€์œผ๋กœ ํŒŒํ‹ฐ์…”๋‹ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ์ค‘๋ณต ์—†์ด ์•ˆ์ „ํ•œ ์žฌ์‹œ๋„์™€ ๋ฐฑํ•„์ด ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค.

์ž‘๊ณ  ์ง‘์ค‘๋œ DAG. ์ˆ˜์‹ญ ๊ฐœ์˜ ํƒœ์Šคํฌ๋ฅผ ๊ฐ€์ง„ ๋ชจ๋†€๋ฆฌ์‹ DAG ๊ตฌ์ถ•์„ ์ง€์–‘ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ณต์žกํ•œ ํŒŒ์ดํ”„๋ผ์ธ์€ ์—์…‹(์ด์ „ datasets)์œผ๋กœ ์—ฐ๊ฒฐ๋œ ์—ฌ๋Ÿฌ DAG๋กœ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค. ์ž‘์€ DAG๋Š” ํŒŒ์‹ฑ์ด ๋น ๋ฅด๊ณ  ๋””๋ฒ„๊น…์ด ์‰ฌ์šฐ๋ฉฐ, ๋ถ€๋ถ„์ ์ธ ํŒŒ์ดํ”„๋ผ์ธ ์žฌ์‹œ์ž‘์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

์—ฐ๊ฒฐ ๊ด€๋ฆฌ. ๋ชจ๋“  ์ธ์ฆ ์ •๋ณด๋ฅผ Airflow์˜ ์—ฐ๊ฒฐ ๊ด€๋ฆฌ์ž ๋˜๋Š” ์™ธ๋ถ€ ์‹œํฌ๋ฆฟ ๋ฐฑ์—”๋“œ(AWS Secrets Manager, HashiCorp Vault)์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. DAG ํŒŒ์ผ์— ์ธ์ฆ ์ •๋ณด๋ฅผ ํ•˜๋“œ์ฝ”๋”ฉํ•ด์„œ๋Š” ์•ˆ ๋ฉ๋‹ˆ๋‹ค. Airflow 3.x๋Š” ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ์—ฐ๊ฒฐ ํ•„๋“œ๋ฅผ ์ €์žฅ ์‹œ ์•”ํ˜ธํ™”ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ์•Œ๋ฆผ. Airflow ๋ฉ”ํŠธ๋ฆญ์„ Prometheus ๋˜๋Š” StatsD๋กœ ๋‚ด๋ณด๋ƒ…๋‹ˆ๋‹ค. scheduler_heartbeat, dag_processing.total_parse_time, executor.queued_tasks๋ฅผ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค. Airflow 3.2์—์„œ๋Š” OpenTelemetry ํŠธ๋ ˆ์ด์Šค๊ฐ€ ์ถ”๊ฐ€๋˜์–ด ํŒŒ์ดํ”„๋ผ์ธ์˜ ์—”๋“œํˆฌ์—”๋“œ ๊ด€์ธก ๊ฐ€๋Šฅ์„ฑ์ด ํ™•๋ณด๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ํŒŒ์ดํ”„๋ผ์ธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜์„ ํฌํ•จํ•œ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด ๋ฉด์ ‘์„ ์ค€๋น„ํ•˜๊ณ  ๊ณ„์‹ ๋‹ค๋ฉด, SharpSkill์˜ Airflow ๋ฉด์ ‘ ๋Œ€๋น„ ๋ชจ๋“ˆ์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์˜ ์‹ค์ œ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์—ฐ์Šต ๋ฌธ์ œ๊ฐ€ ์ค€๋น„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์—ฐ์Šต์„ ์‹œ์ž‘ํ•˜์„ธ์š”!

๋ฉด์ ‘ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์™€ ๊ธฐ์ˆ  ํ…Œ์ŠคํŠธ๋กœ ์ง€์‹์„ ํ…Œ์ŠคํŠธํ•˜์„ธ์š”.

๊ฒฐ๋ก 

  • Airflow 3.2๋Š” DAG ์ž‘์„ฑ์„ ์œ„ํ•œ ์•ˆ์ • API๋กœ Task SDK(airflow.sdk)๋ฅผ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ–ฅํ›„ ํ˜ธํ™˜์„ฑ ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์ž„ํฌํŠธ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค
  • ์—์…‹ ํŒŒํ‹ฐ์…˜์€ ํŒŒํ‹ฐ์…˜ ์ˆ˜์ค€์˜ ๋ฐ์ดํ„ฐ ์ธ์‹ ์Šค์ผ€์ค„๋ง์„ ์ง€์›ํ•˜์—ฌ, ๋ถ€๋ถ„์  ๋ฐ์ดํ„ฐ ์—…๋ฐ์ดํŠธ๋กœ ์ธํ•œ ๋ถˆํ•„์š”ํ•œ ํŒŒ์ดํ”„๋ผ์ธ ํŠธ๋ฆฌ๊ฑฐ๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค
  • PythonOperator์˜ ๋„ค์ดํ‹ฐ๋ธŒ ๋น„๋™๊ธฐ ์ง€์›์œผ๋กœ ์ปค์Šคํ…€ deferrable ์˜คํผ๋ ˆ์ดํ„ฐ ์—†์ด I/O ์ง‘์•ฝ์  ์›Œํฌ๋กœ๋“œ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
  • .expand()๋ฅผ ํ™œ์šฉํ•œ ๋™์  ํƒœ์Šคํฌ ๋งคํ•‘์€ DAG ์ฝ”๋“œ ๋ณ€๊ฒฝ ์—†์ด ๊ฐ€๋ณ€์ ์ธ ์›Œํฌ๋กœ๋“œ ํฌ๊ธฐ์— ๋Ÿฐํƒ€์ž„์—์„œ ๋Œ€์‘ํ•ฉ๋‹ˆ๋‹ค
  • ๋ฉด์ ‘ ์ค€๋น„ ์‹œ DAG ๋ฉ”์ปค๋‹ˆ์ฆ˜, ์ต์Šคํํ„ฐ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„(Celery vs. Kubernetes), XCom ์ œํ•œ์‚ฌํ•ญ, ๋ฉฑ๋“ฑ์„ฑ ํŒจํ„ด์„ ๋ฐ˜๋“œ์‹œ ์ˆ™์ง€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค
  • ํ”„๋กœ๋•์…˜ ๋ฐฐํฌ์—์„œ๋Š” ์ž‘๊ณ  ๋ฉฑ๋“ฑํ•œ DAG, ์™ธ๋ถ€ ์‹œํฌ๋ฆฟ ๊ด€๋ฆฌ, ๊ด€์ธก ๊ฐ€๋Šฅ์„ฑ ํ”Œ๋žซํผ์œผ๋กœ์˜ ๋ฉ”ํŠธ๋ฆญ ๋‚ด๋ณด๋‚ด๊ธฐ๊ฐ€ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค

์—ฐ์Šต์„ ์‹œ์ž‘ํ•˜์„ธ์š”!

๋ฉด์ ‘ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์™€ ๊ธฐ์ˆ  ํ…Œ์ŠคํŠธ๋กœ ์ง€์‹์„ ํ…Œ์ŠคํŠธํ•˜์„ธ์š”.

๊ณต์œ 

๊ด€๋ จ ๊ธฐ์‚ฌ

dbt data transformations and testing tutorial 2026

dbt 2026 ์™„๋ฒฝ ๊ฐ€์ด๋“œ: ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜, ํ…Œ์ŠคํŠธ ์ „๋žต, ๋ฉด์ ‘ ์งˆ๋ฌธ ์ด์ •๋ฆฌ

dbt๋ฅผ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜์˜ ํ•ต์‹ฌ ๊ฐœ๋…๋ถ€ํ„ฐ ์‹ค๋ฌด๊นŒ์ง€, ๋ ˆ์ด์–ด๋“œ ๋ชจ๋ธ๋ง, ์ธํฌ๋ฆฌ๋ฉ˜ํƒˆ ์ „๋žต, ํ…Œ์ŠคํŠธ ๋ฐฉ๋ฒ•๋ก , ๊ทธ๋ฆฌ๊ณ  2026๋…„ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ๋ฉด์ ‘์—์„œ ์ž์ฃผ ์ถœ์ œ๋˜๋Š” ์งˆ๋ฌธ์„ ์ฝ”๋“œ ์˜ˆ์ œ์™€ ํ•จ๊ป˜ ์ƒ์„ธํžˆ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

Apache Spark 4 ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง Structured Streaming ํŒŒ์ดํ”„๋ผ์ธ ๋‹ค์ด์–ด๊ทธ๋žจ

2026๋…„ Apache Spark 4 ์™„๋ฒฝ ๊ฐ€์ด๋“œ: ์‹ ๊ทœ ๊ธฐ๋Šฅ, Structured Streaming, ๋ฉด์ ‘ ์งˆ๋ฌธ

Apache Spark 4์˜ ํ•ต์‹ฌ ์‹ ๊ทœ ๊ธฐ๋Šฅ์ธ ANSI SQL ๋ชจ๋“œ, VARIANT ๋ฐ์ดํ„ฐ ํƒ€์ž…, ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ชจ๋“œ, Spark Connect๋ฅผ ์‹ฌ์ธต ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ๋ฉด์ ‘์„ ์œ„ํ•œ ํ•„์ˆ˜ ์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€๋„ ํ•จ๊ป˜ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Apache Kafka ์ŠคํŠธ๋ฆฌ๋ฐ ์•„ํ‚คํ…์ฒ˜์™€ ํŒŒํ‹ฐ์…˜ ๋ฐ์ดํ„ฐ ํ๋ฆ„ ๋‹ค์ด์–ด๊ทธ๋žจ

๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ฅผ ์œ„ํ•œ Apache Kafka: ์ŠคํŠธ๋ฆฌ๋ฐ, ํŒŒํ‹ฐ์…˜, ๋ฉด์ ‘ ์งˆ๋ฌธ

๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ฅผ ์œ„ํ•œ Apache Kafka ์‹ฌ์ธต ๋ถ„์„. Kafka 4.x์™€ KRaft๋ฅผ ํ™œ์šฉํ•œ ์ŠคํŠธ๋ฆฌ๋ฐ ์•„ํ‚คํ…์ฒ˜, ํŒŒํ‹ฐ์…˜ ์ „๋žต, ์ปจ์Šˆ๋จธ ๊ทธ๋ฃน, ๊ธฐ์ˆ  ๋ฉด์ ‘ ๋นˆ์ถœ ์งˆ๋ฌธ์„ ์‹ค์ „ ์ฝ”๋“œ ์˜ˆ์ œ์™€ ํ•จ๊ป˜ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.