Data Science & ML

Feature Engineering

Categorical encoding, scaling, normalization, feature selection, feature creation, pipelines

22 interview questions·
Mid-Level
1

Which encoding type should be used for a nominal categorical variable with few distinct categories (less than 10)?

Answer

One-Hot Encoding is ideal for nominal variables with few categories because it creates a binary column for each category without introducing artificial ordering. Unlike Label Encoding which assigns numbers (0, 1, 2...), One-Hot prevents the model from interpreting a non-existent ordinal relationship between categories.

2

What is the main difference between StandardScaler and MinMaxScaler?

Answer

StandardScaler centers data around 0 with a standard deviation of 1 (z-score), while MinMaxScaler normalizes data within a fixed range, usually [0, 1]. StandardScaler is less sensitive to outliers because it uses mean and standard deviation, whereas MinMaxScaler can be strongly affected by extreme values.

3

Which scaler should be preferred when data contains significant outliers?

Answer

RobustScaler uses median and interquartile range (IQR) instead of mean and standard deviation, making it robust to outliers. Extreme values do not significantly affect these statistics, unlike StandardScaler or MinMaxScaler which can be strongly biased by outliers.

4

What is Label Encoding and when is it appropriate to use it?

5

What problem can Target Encoding cause and how to avoid it?

6

Why is feature scaling necessary before training a regularized linear regression model?

+19 interview questions

Master Data Science & ML for your next interview

Access all questions, flashcards, technical tests and interview simulators.

Start for free