Data Science & ML

Unsupervised ML

K-Means, hierarchical clustering, DBSCAN, PCA, t-SNE, UMAP, silhouette score, elbow method

22 interview questions·
Mid-Level
1

What is the main difference between supervised and unsupervised learning?

Answer

Unsupervised learning works with unlabeled data, seeking to discover hidden structures or patterns without a predefined target variable. Unlike supervised learning which predicts a known value (label), unsupervised learning explores data to find natural groups, reduce dimensionality, or detect anomalies. Algorithms like K-Means, PCA, or DBSCAN are typical examples of unsupervised learning.

2

How does the K-Means algorithm work to partition data?

Answer

K-Means is an iterative algorithm that partitions data into K clusters. It randomly initializes K centroids, then alternates between two steps: assigning each point to the nearest centroid (assignment step) and recalculating centroid positions as the mean of assigned points (update step). The algorithm converges when assignments no longer change or after a maximum number of iterations.

3

Which method should be used to determine the optimal number of clusters K in K-Means?

Answer

The elbow method plots inertia (sum of squared distances between each point and its centroid) against K. The point where the curve forms an elbow indicates optimal K, as beyond it adding clusters no longer significantly improves inertia. This method is complemented by silhouette score to validate cluster quality.

4

What does the silhouette score measure in the context of clustering?

5

What is the range of silhouette score values and how to interpret a score of 0.7?

6

What major limitation of K-Means makes the algorithm unsuitable for non-spherical cluster shapes?

+19 interview questions

Master Data Science & ML for your next interview

Access all questions, flashcards, technical tests and interview simulators.

Start for free