Question 1

Which encoding type should be used for a nominal categorical variable with few distinct categories (less than 10)?

Accepted Answer

One-Hot Encoding is ideal for nominal variables with few categories because it creates a binary column for each category without introducing artificial ordering. Unlike Label Encoding which assigns numbers (0, 1, 2...), One-Hot prevents the model from interpreting a non-existent ordinal relationship between categories.

Question 2

What is the main difference between StandardScaler and MinMaxScaler?

Accepted Answer

StandardScaler centers data around 0 with a standard deviation of 1 (z-score), while MinMaxScaler normalizes data within a fixed range, usually [0, 1]. StandardScaler is less sensitive to outliers because it uses mean and standard deviation, whereas MinMaxScaler can be strongly affected by extreme values.

Question 3

Which scaler should be preferred when data contains significant outliers?

Accepted Answer

RobustScaler uses median and interquartile range (IQR) instead of mean and standard deviation, making it robust to outliers. Extreme values do not significantly affect these statistics, unlike StandardScaler or MinMaxScaler which can be strongly biased by outliers.

Feature Engineering

Which encoding type should be used for a nominal categorical variable with few distinct categories (less than 10)?

Answer

What is the main difference between StandardScaler and MinMaxScaler?

Answer

Which scaler should be preferred when data contains significant outliers?

Answer

What is Label Encoding and when is it appropriate to use it?

What problem can Target Encoding cause and how to avoid it?

Other Data Science & ML interview topics

Python Basics

Python Object-Oriented Programming

Python Data Structures

Git Fundamentals

SQL Basics

NumPy Fundamentals

Pandas Basics

Jupyter & Google Colab

SQL Joins & Advanced Queries

Advanced Pandas

Visualization with Matplotlib & Seaborn

Interactive Visualizations with Plotly

Descriptive Statistics

Inferential Statistics

Web Scraping

BigQuery & Cloud Data

Supervised ML: Regression

Supervised ML: Classification

Decision Trees & Ensembles

Unsupervised ML

ML Pipelines & Validation

Time Series & Forecasting

Deep Learning Fundamentals

TensorFlow & Keras

CNN & Image Classification

RNN & Sequences

Transformers & Attention

NLP & Hugging Face

GenAI & LangChain

MLOps & Deployment

Master Data Science & ML for your next interview