Question 1

What is the main difference between supervised and unsupervised learning?

Accepted Answer

Unsupervised learning works with unlabeled data, seeking to discover hidden structures or patterns without a predefined target variable. Unlike supervised learning which predicts a known value (label), unsupervised learning explores data to find natural groups, reduce dimensionality, or detect anomalies. Algorithms like K-Means, PCA, or DBSCAN are typical examples of unsupervised learning.

Question 2

How does the K-Means algorithm work to partition data?

Accepted Answer

K-Means is an iterative algorithm that partitions data into K clusters. It randomly initializes K centroids, then alternates between two steps: assigning each point to the nearest centroid (assignment step) and recalculating centroid positions as the mean of assigned points (update step). The algorithm converges when assignments no longer change or after a maximum number of iterations.

Question 3

Which method should be used to determine the optimal number of clusters K in K-Means?

Accepted Answer

The elbow method plots inertia (sum of squared distances between each point and its centroid) against K. The point where the curve forms an elbow indicates optimal K, as beyond it adding clusters no longer significantly improves inertia. This method is complemented by silhouette score to validate cluster quality.

Unsupervised ML

What is the main difference between supervised and unsupervised learning?

Answer

How does the K-Means algorithm work to partition data?

Answer

Which method should be used to determine the optimal number of clusters K in K-Means?

Answer

What does the silhouette score measure in the context of clustering?

What is the range of silhouette score values and how to interpret a score of 0.7?

Other Data Science & ML interview topics

Python Basics

Python Object-Oriented Programming

Python Data Structures

Git Fundamentals

SQL Basics

NumPy Fundamentals

Pandas Basics

Jupyter & Google Colab

SQL Joins & Advanced Queries

Advanced Pandas

Visualization with Matplotlib & Seaborn

Interactive Visualizations with Plotly

Descriptive Statistics

Inferential Statistics

Web Scraping

BigQuery & Cloud Data

Feature Engineering

Supervised ML: Regression

Supervised ML: Classification

Decision Trees & Ensembles

ML Pipelines & Validation

Time Series & Forecasting

Deep Learning Fundamentals

TensorFlow & Keras

CNN & Image Classification

RNN & Sequences

Transformers & Attention

NLP & Hugging Face

GenAI & LangChain

MLOps & Deployment

Master Data Science & ML for your next interview