Question 1

What is a decision tree in Machine Learning?

Accepted Answer

A decision tree is a Machine Learning model that makes predictions by splitting data according to hierarchical decision rules. Each internal node represents a test on a feature, each branch represents the outcome of the test, and each leaf represents a final prediction. This model is intuitive and easily interpretable, making it an excellent choice for understanding factors influencing a decision.

Question 2

What criterion is used by default in scikit-learn to measure the quality of a split in a classification tree?

Accepted Answer

The Gini index is the default criterion in scikit-learn for classification trees. It measures the impurity of a node by calculating the probability that an element would be misclassified if randomly classified according to the class distribution. A Gini of 0 means a pure node (single class), while a higher Gini indicates greater class diversity.

Question 3

What is the main difference between Gini index and entropy as split criteria?

Accepted Answer

Gini index and entropy generally produce very similar trees, but Gini is slightly faster to compute as it doesn't require logarithmic calculation. Entropy, based on information theory, may sometimes create slightly more balanced splits. In practice, the choice between the two rarely has a significant impact on model performance.

Decision Trees & Ensembles

What is a decision tree in Machine Learning?

Answer

What criterion is used by default in scikit-learn to measure the quality of a split in a classification tree?

Answer

What is the main difference between Gini index and entropy as split criteria?

Answer

What is pruning in the context of decision trees?

Which hyperparameter controls the maximum depth of a decision tree in scikit-learn?

Other Data Science & ML interview topics

Python Basics

Python Object-Oriented Programming

Python Data Structures

Git Fundamentals

SQL Basics

NumPy Fundamentals

Pandas Basics

Jupyter & Google Colab

SQL Joins & Advanced Queries

Advanced Pandas

Visualization with Matplotlib & Seaborn

Interactive Visualizations with Plotly

Descriptive Statistics

Inferential Statistics

Web Scraping

BigQuery & Cloud Data

Feature Engineering

Supervised ML: Regression

Supervised ML: Classification

Unsupervised ML

ML Pipelines & Validation

Time Series & Forecasting

Deep Learning Fundamentals

TensorFlow & Keras

CNN & Image Classification

RNN & Sequences

Transformers & Attention

NLP & Hugging Face

GenAI & LangChain

MLOps & Deployment

Master Data Science & ML for your next interview