Question 1

Which measure of central tendency is most appropriate for data containing extreme values (outliers)?

Accepted Answer

The median is the most robust measure of central tendency against outliers because it represents the middle value of sorted data, unaffected by extreme values. Unlike the mean which sums all values, the median only considers position. For example, for company salaries with a few highly paid executives, the median gives a better representation of the typical salary than the mean.

Question 2

What is the variance of a dataset?

Accepted Answer

Variance measures the dispersion of data around their mean. It is calculated as the average of the squared deviations from the mean. By squaring, we get always positive values and amplify the impact of values far from the mean. The variance unit is the square of the original data unit, which is why we often use the standard deviation (square root of variance) to interpret dispersion in the original unit.

Question 3

What is the relationship between standard deviation and variance?

Accepted Answer

The standard deviation is the square root of the variance. This transformation brings the dispersion measure back to the original data unit, making interpretation easier. For example, if data is in euros, variance will be in squared euros (hard to interpret), while standard deviation will be in euros. Standard deviation is therefore preferred for communicating data dispersion intuitively.

Descriptive Statistics

Which measure of central tendency is most appropriate for data containing extreme values (outliers)?

Answer

What is the variance of a dataset?

Answer

What is the relationship between standard deviation and variance?

Answer

What does the first quartile (Q1) of a distribution represent?

How to interpret a Pearson correlation coefficient of -0.85?

Other Data Science & ML interview topics

Python Basics

Python Object-Oriented Programming

Python Data Structures

Git Fundamentals

SQL Basics

NumPy Fundamentals

Pandas Basics

Jupyter & Google Colab

SQL Joins & Advanced Queries

Advanced Pandas

Visualization with Matplotlib & Seaborn

Interactive Visualizations with Plotly

Inferential Statistics

Web Scraping

BigQuery & Cloud Data

Feature Engineering

Supervised ML: Regression

Supervised ML: Classification

Decision Trees & Ensembles

Unsupervised ML

ML Pipelines & Validation

Time Series & Forecasting

Deep Learning Fundamentals

TensorFlow & Keras

CNN & Image Classification

RNN & Sequences

Transformers & Attention

NLP & Hugging Face

GenAI & LangChain

MLOps & Deployment

Master Data Science & ML for your next interview