
Decision Trees & Ensembles
Decision Trees, Random Forest, Gradient Boosting, XGBoost, hyperparameter tuning, feature importance
1What is a decision tree in Machine Learning?
What is a decision tree in Machine Learning?
Answer
A decision tree is a Machine Learning model that makes predictions by splitting data according to hierarchical decision rules. Each internal node represents a test on a feature, each branch represents the outcome of the test, and each leaf represents a final prediction. This model is intuitive and easily interpretable, making it an excellent choice for understanding factors influencing a decision.
2What criterion is used by default in scikit-learn to measure the quality of a split in a classification tree?
What criterion is used by default in scikit-learn to measure the quality of a split in a classification tree?
Answer
The Gini index is the default criterion in scikit-learn for classification trees. It measures the impurity of a node by calculating the probability that an element would be misclassified if randomly classified according to the class distribution. A Gini of 0 means a pure node (single class), while a higher Gini indicates greater class diversity.
3What is the main difference between Gini index and entropy as split criteria?
What is the main difference between Gini index and entropy as split criteria?
Answer
Gini index and entropy generally produce very similar trees, but Gini is slightly faster to compute as it doesn't require logarithmic calculation. Entropy, based on information theory, may sometimes create slightly more balanced splits. In practice, the choice between the two rarely has a significant impact on model performance.
What is pruning in the context of decision trees?
Which hyperparameter controls the maximum depth of a decision tree in scikit-learn?
+21 interview questions
Other Data Science & ML interview topics
Python Basics
Python Object-Oriented Programming
Python Data Structures
Git Fundamentals
SQL Basics
NumPy Fundamentals
Pandas Basics
Jupyter & Google Colab
SQL Joins & Advanced Queries
Advanced Pandas
Visualization with Matplotlib & Seaborn
Interactive Visualizations with Plotly
Descriptive Statistics
Inferential Statistics
Web Scraping
BigQuery & Cloud Data
Feature Engineering
Supervised ML: Regression
Supervised ML: Classification
Unsupervised ML
ML Pipelines & Validation
Time Series & Forecasting
Deep Learning Fundamentals
TensorFlow & Keras
CNN & Image Classification
RNN & Sequences
Transformers & Attention
NLP & Hugging Face
GenAI & LangChain
MLOps & Deployment
Master Data Science & ML for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free