
Unsupervised ML
K-Means, hierarchical clustering, DBSCAN, PCA, t-SNE, UMAP, silhouette score, elbow method
1What is the main difference between supervised and unsupervised learning?
What is the main difference between supervised and unsupervised learning?
Answer
Unsupervised learning works with unlabeled data, seeking to discover hidden structures or patterns without a predefined target variable. Unlike supervised learning which predicts a known value (label), unsupervised learning explores data to find natural groups, reduce dimensionality, or detect anomalies. Algorithms like K-Means, PCA, or DBSCAN are typical examples of unsupervised learning.
2How does the K-Means algorithm work to partition data?
How does the K-Means algorithm work to partition data?
Answer
K-Means is an iterative algorithm that partitions data into K clusters. It randomly initializes K centroids, then alternates between two steps: assigning each point to the nearest centroid (assignment step) and recalculating centroid positions as the mean of assigned points (update step). The algorithm converges when assignments no longer change or after a maximum number of iterations.
3Which method should be used to determine the optimal number of clusters K in K-Means?
Which method should be used to determine the optimal number of clusters K in K-Means?
Answer
The elbow method plots inertia (sum of squared distances between each point and its centroid) against K. The point where the curve forms an elbow indicates optimal K, as beyond it adding clusters no longer significantly improves inertia. This method is complemented by silhouette score to validate cluster quality.
What does the silhouette score measure in the context of clustering?
What is the range of silhouette score values and how to interpret a score of 0.7?
+19 interview questions
Other Data Science & ML interview topics
Python Basics
Python Object-Oriented Programming
Python Data Structures
Git Fundamentals
SQL Basics
NumPy Fundamentals
Pandas Basics
Jupyter & Google Colab
SQL Joins & Advanced Queries
Advanced Pandas
Visualization with Matplotlib & Seaborn
Interactive Visualizations with Plotly
Descriptive Statistics
Inferential Statistics
Web Scraping
BigQuery & Cloud Data
Feature Engineering
Supervised ML: Regression
Supervised ML: Classification
Decision Trees & Ensembles
ML Pipelines & Validation
Time Series & Forecasting
Deep Learning Fundamentals
TensorFlow & Keras
CNN & Image Classification
RNN & Sequences
Transformers & Attention
NLP & Hugging Face
GenAI & LangChain
MLOps & Deployment
Master Data Science & ML for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free