
Feature Engineering
Categorical encoding, scaling, normalization, feature selection, feature creation, pipelines
1Which encoding type should be used for a nominal categorical variable with few distinct categories (less than 10)?
Which encoding type should be used for a nominal categorical variable with few distinct categories (less than 10)?
Answer
One-Hot Encoding is ideal for nominal variables with few categories because it creates a binary column for each category without introducing artificial ordering. Unlike Label Encoding which assigns numbers (0, 1, 2...), One-Hot prevents the model from interpreting a non-existent ordinal relationship between categories.
2What is the main difference between StandardScaler and MinMaxScaler?
What is the main difference between StandardScaler and MinMaxScaler?
Answer
StandardScaler centers data around 0 with a standard deviation of 1 (z-score), while MinMaxScaler normalizes data within a fixed range, usually [0, 1]. StandardScaler is less sensitive to outliers because it uses mean and standard deviation, whereas MinMaxScaler can be strongly affected by extreme values.
3Which scaler should be preferred when data contains significant outliers?
Which scaler should be preferred when data contains significant outliers?
Answer
RobustScaler uses median and interquartile range (IQR) instead of mean and standard deviation, making it robust to outliers. Extreme values do not significantly affect these statistics, unlike StandardScaler or MinMaxScaler which can be strongly biased by outliers.
What is Label Encoding and when is it appropriate to use it?
What problem can Target Encoding cause and how to avoid it?
Why is feature scaling necessary before training a regularized linear regression model?
+19 interview questions
Master Data Science & ML for your next interview
Access all questions, flashcards, technical tests and interview simulators.
Start for free