
Feature Engineering
Categorical encoding, scaling, normalization, feature selection, feature creation, pipelines
1Which encoding type should be used for a nominal categorical variable with few distinct categories (less than 10)?
Which encoding type should be used for a nominal categorical variable with few distinct categories (less than 10)?
Answer
One-Hot Encoding is ideal for nominal variables with few categories because it creates a binary column for each category without introducing artificial ordering. Unlike Label Encoding which assigns numbers (0, 1, 2...), One-Hot prevents the model from interpreting a non-existent ordinal relationship between categories.
2What is the main difference between StandardScaler and MinMaxScaler?
What is the main difference between StandardScaler and MinMaxScaler?
Answer
StandardScaler centers data around 0 with a standard deviation of 1 (z-score), while MinMaxScaler normalizes data within a fixed range, usually [0, 1]. StandardScaler is less sensitive to outliers because it uses mean and standard deviation, whereas MinMaxScaler can be strongly affected by extreme values.
3Which scaler should be preferred when data contains significant outliers?
Which scaler should be preferred when data contains significant outliers?
Answer
RobustScaler uses median and interquartile range (IQR) instead of mean and standard deviation, making it robust to outliers. Extreme values do not significantly affect these statistics, unlike StandardScaler or MinMaxScaler which can be strongly biased by outliers.
What is Label Encoding and when is it appropriate to use it?
What problem can Target Encoding cause and how to avoid it?
+19 interview questions
Other Data Science & ML interview topics
Python Basics
Python Object-Oriented Programming
Python Data Structures
Git Fundamentals
SQL Basics
NumPy Fundamentals
Pandas Basics
Jupyter & Google Colab
SQL Joins & Advanced Queries
Advanced Pandas
Visualization with Matplotlib & Seaborn
Interactive Visualizations with Plotly
Descriptive Statistics
Inferential Statistics
Web Scraping
BigQuery & Cloud Data
Supervised ML: Regression
Supervised ML: Classification
Decision Trees & Ensembles
Unsupervised ML
ML Pipelines & Validation
Time Series & Forecasting
Deep Learning Fundamentals
TensorFlow & Keras
CNN & Image Classification
RNN & Sequences
Transformers & Attention
NLP & Hugging Face
GenAI & LangChain
MLOps & Deployment
Master Data Science & ML for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free