Data Science & ML

ML ํŒŒ์ดํ”„๋ผ์ธ ๋ฐ ๊ฒ€์ฆ

Scikit-learn ํŒŒ์ดํ”„๋ผ์ธ, cross-validation, GridSearchCV, RandomizedSearchCV, data leakage, ๊ณ„์ธตํ™”

22 ๋ฉด์ ‘ ์งˆ๋ฌธยท
Mid-Level
1

๋ณ€ํ™˜์„ ์ˆ˜๋™์œผ๋กœ ์ ์šฉํ•˜๋Š” ๋Œ€์‹  scikit-learn Pipeline์„ ์‚ฌ์šฉํ•˜๋Š” ์ฃผ์š” ์ด์ ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

๋‹ต๋ณ€

Pipeline์€ ๋™์ผํ•œ ๋ณ€ํ™˜์ด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ์ผ๊ด€๋˜๊ฒŒ ์ ์šฉ๋˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ์ „์ฒ˜๋ฆฌ ๋ฐ ๋ชจ๋ธ๋ง ๋‹จ๊ณ„๋ฅผ ๋‹จ์ผ ๊ฐ์ฒด๋กœ ์บก์Аํ™”ํ•˜์—ฌ ์ฝ”๋“œ๋ฅผ ๋‹จ์ˆœํ™”ํ•˜๊ณ  data leakage๋ฅผ ๋ฐฉ์ง€ํ•˜๋ฉฐ ๋ชจ๋ธ์„ ํ”„๋กœ๋•์…˜์— ๋ฐฐํฌํ•˜๊ธฐ ์‰ฝ๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

2

Pipeline์˜ ๋ชจ๋“  ๋‹จ๊ณ„๋ฅผ ํ›ˆ๋ จํ•˜๊ณ  ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์–ด๋–ค ๋ฉ”์„œ๋“œ๋ฅผ ํ˜ธ์ถœํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ?

๋‹ต๋ณ€

fit_predict ๋ฉ”์„œ๋“œ๋Š” ํšŒ๊ท€ ๋˜๋Š” ๋ถ„๋ฅ˜ Pipeline์—๋Š” ์กด์žฌํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋จผ์ € fit()์„ ํ˜ธ์ถœํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์„ ํ›ˆ๋ จํ•œ ๋‹ค์Œ predict()๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์˜ˆ์ธก์„ ์–ป์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜๋Š” ๋” ๋งŽ์€ ์ œ์–ด๋ฅผ ์œ„ํ•ด fit() ๋‹ค์Œ์— predict()๋ฅผ ๋ณ„๋„๋กœ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3

๋จธ์‹ ๋Ÿฌ๋‹ ๋งฅ๋ฝ์—์„œ data leakage๋ž€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

๋‹ต๋ณ€

Data leakage๋Š” ํ…Œ์ŠคํŠธ ์„ธํŠธ๋‚˜ ๋ฏธ๋ž˜ ๋ฐ์ดํ„ฐ์˜ ์ •๋ณด๊ฐ€ ํ›ˆ๋ จ ์ค‘์— ์šฐ์—ฐํžˆ ์‚ฌ์šฉ๋  ๋•Œ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ „์ฒ˜๋ฆฌ ์ค‘(๋ถ„ํ•  ์ „์— ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ํ‰๊ท ์„ ๊ณ„์‚ฐ) ๋˜๋Š” ํƒ€๊ฒŸ์„ ๊ฐ„์ ‘์ ์œผ๋กœ ํฌํ•จํ•˜๋Š” ํŠน์„ฑ์„ ํ†ตํ•ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ผ๋ฐ˜ํ™”๋˜์ง€ ์•Š๋Š” ์ธ์œ„์ ์œผ๋กœ ๋†’์€ ์„ฑ๋Šฅ์„ ์ดˆ๋ž˜ํ•ฉ๋‹ˆ๋‹ค.

4

scikit-learn์—์„œ ColumnTransformer์˜ ์—ญํ• ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

5

K-Fold cross-validation์ด๋ž€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

+19 ๋ฉด์ ‘ ์งˆ๋ฌธ

๊ธฐํƒ€ Data Science & ML ๋ฉด์ ‘ ์ฃผ์ œ

Python ๊ธฐ์ดˆ

Junior
25๊ฐœ ์งˆ๋ฌธ

Python ๊ฐ์ฒด ์ง€ํ–ฅ ํ”„๋กœ๊ทธ๋ž˜๋ฐ

Junior
20๊ฐœ ์งˆ๋ฌธ

Python ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ

Junior
20๊ฐœ ์งˆ๋ฌธ

Git ๊ธฐ์ดˆ

Junior
18๊ฐœ ์งˆ๋ฌธ

SQL ๊ธฐ์ดˆ

Junior
20๊ฐœ ์งˆ๋ฌธ

NumPy ๊ธฐ์ดˆ

Junior
22๊ฐœ ์งˆ๋ฌธ

Pandas ๊ธฐ์ดˆ

Junior
22๊ฐœ ์งˆ๋ฌธ

Jupyter & Google Colab

Junior
16๊ฐœ ์งˆ๋ฌธ

SQL Joins ๋ฐ ๊ณ ๊ธ‰ ์ฟผ๋ฆฌ

Mid-Level
22๊ฐœ ์งˆ๋ฌธ

Pandas ๊ณ ๊ธ‰

Mid-Level
24๊ฐœ ์งˆ๋ฌธ

Matplotlib & Seaborn์„ ํ™œ์šฉํ•œ ์‹œ๊ฐํ™”

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

Plotly๋กœ ๋งŒ๋“œ๋Š” ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์‹œ๊ฐํ™”

Mid-Level
18๊ฐœ ์งˆ๋ฌธ

๊ธฐ์ˆ  ํ†ต๊ณ„

Mid-Level
20๊ฐœ ์งˆ๋ฌธ

์ถ”๋ก  ํ†ต๊ณ„ํ•™

Mid-Level
24๊ฐœ ์งˆ๋ฌธ

Web Scraping

Mid-Level
18๊ฐœ ์งˆ๋ฌธ

BigQuery & Cloud Data

Mid-Level
18๊ฐœ ์งˆ๋ฌธ

Feature Engineering

Mid-Level
22๊ฐœ ์งˆ๋ฌธ

์ง€๋„ ๋จธ์‹ ๋Ÿฌ๋‹: ํšŒ๊ท€

Mid-Level
24๊ฐœ ์งˆ๋ฌธ

์ง€๋„ ๋จธ์‹ ๋Ÿฌ๋‹: ๋ถ„๋ฅ˜

Mid-Level
24๊ฐœ ์งˆ๋ฌธ

๊ฒฐ์ • ํŠธ๋ฆฌ ๋ฐ ์•™์ƒ๋ธ”

Mid-Level
24๊ฐœ ์งˆ๋ฌธ

๋น„์ง€๋„ ML

Mid-Level
22๊ฐœ ์งˆ๋ฌธ

์‹œ๊ณ„์—ด ๋ฐ ์˜ˆ์ธก

Mid-Level
22๊ฐœ ์งˆ๋ฌธ

Deep Learning ๊ธฐ์ดˆ

Senior
24๊ฐœ ์งˆ๋ฌธ

TensorFlow & Keras

Senior
22๊ฐœ ์งˆ๋ฌธ

CNN ๋ฐ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜

Senior
24๊ฐœ ์งˆ๋ฌธ

RNN ๋ฐ ์‹œํ€€์Šค

Senior
22๊ฐœ ์งˆ๋ฌธ

Transformers ๋ฐ Attention

Senior
24๊ฐœ ์งˆ๋ฌธ

NLP ๋ฐ Hugging Face

Senior
24๊ฐœ ์งˆ๋ฌธ

GenAI ๋ฐ LangChain

Senior
24๊ฐœ ์งˆ๋ฌธ

MLOps ๋ฐ ๋ฐฐํฌ

Senior
24๊ฐœ ์งˆ๋ฌธ

๋‹ค์Œ ๋ฉด์ ‘์„ ์œ„ํ•ด Data Science & ML์„ ๋งˆ์Šคํ„ฐํ•˜์„ธ์š”

๋ชจ๋“  ์งˆ๋ฌธ, flashcards, ๊ธฐ์ˆ  ํ…Œ์ŠคํŠธ, ์ฝ”๋“œ ๋ฆฌ๋ทฐ ์—ฐ์Šต, ๋ฉด์ ‘ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์— ์ ‘๊ทผํ•˜์„ธ์š”.

๋ฌด๋ฃŒ๋กœ ์‹œ์ž‘ํ•˜๊ธฐ