
Decision Trees & Ensembles
Decision Trees, Random Forest, Gradient Boosting, XGBoost, hyperparameter tuning, feature importance
1What is a decision tree in Machine Learning?
What is a decision tree in Machine Learning?
Answer
A decision tree is a Machine Learning model that makes predictions by splitting data according to hierarchical decision rules. Each internal node represents a test on a feature, each branch represents the outcome of the test, and each leaf represents a final prediction. This model is intuitive and easily interpretable, making it an excellent choice for understanding factors influencing a decision.
2What criterion is used by default in scikit-learn to measure the quality of a split in a classification tree?
What criterion is used by default in scikit-learn to measure the quality of a split in a classification tree?
Answer
The Gini index is the default criterion in scikit-learn for classification trees. It measures the impurity of a node by calculating the probability that an element would be misclassified if randomly classified according to the class distribution. A Gini of 0 means a pure node (single class), while a higher Gini indicates greater class diversity.
3What is the main difference between Gini index and entropy as split criteria?
What is the main difference between Gini index and entropy as split criteria?
Answer
Gini index and entropy generally produce very similar trees, but Gini is slightly faster to compute as it doesn't require logarithmic calculation. Entropy, based on information theory, may sometimes create slightly more balanced splits. In practice, the choice between the two rarely has a significant impact on model performance.
What is pruning in the context of decision trees?
Which hyperparameter controls the maximum depth of a decision tree in scikit-learn?
What is a Random Forest?
+21 interview questions
Master Data Science & ML for your next interview
Access all questions, flashcards, technical tests and interview simulators.
Start for free