Data Science & ML

Decision Trees & Ensembles

Decision Trees, Random Forest, Gradient Boosting, XGBoost, hyperparameter tuning, feature importance

24 interview questions·
Mid-Level
1

What is a decision tree in Machine Learning?

Answer

A decision tree is a Machine Learning model that makes predictions by splitting data according to hierarchical decision rules. Each internal node represents a test on a feature, each branch represents the outcome of the test, and each leaf represents a final prediction. This model is intuitive and easily interpretable, making it an excellent choice for understanding factors influencing a decision.

2

What criterion is used by default in scikit-learn to measure the quality of a split in a classification tree?

Answer

The Gini index is the default criterion in scikit-learn for classification trees. It measures the impurity of a node by calculating the probability that an element would be misclassified if randomly classified according to the class distribution. A Gini of 0 means a pure node (single class), while a higher Gini indicates greater class diversity.

3

What is the main difference between Gini index and entropy as split criteria?

Answer

Gini index and entropy generally produce very similar trees, but Gini is slightly faster to compute as it doesn't require logarithmic calculation. Entropy, based on information theory, may sometimes create slightly more balanced splits. In practice, the choice between the two rarely has a significant impact on model performance.

4

What is pruning in the context of decision trees?

5

Which hyperparameter controls the maximum depth of a decision tree in scikit-learn?

6

What is a Random Forest?

+21 interview questions

Master Data Science & ML for your next interview

Access all questions, flashcards, technical tests and interview simulators.

Start for free