Data Science & ML

Descriptive Statistics

Mean, median, standard deviation, quartiles, distributions, correlations, outliers, skewness

20 interview questionsยท
Mid-Level
1

Which measure of central tendency is most appropriate for data containing extreme values (outliers)?

Answer

The median is the most robust measure of central tendency against outliers because it represents the middle value of sorted data, unaffected by extreme values. Unlike the mean which sums all values, the median only considers position. For example, for company salaries with a few highly paid executives, the median gives a better representation of the typical salary than the mean.

2

What is the variance of a dataset?

Answer

Variance measures the dispersion of data around their mean. It is calculated as the average of the squared deviations from the mean. By squaring, we get always positive values and amplify the impact of values far from the mean. The variance unit is the square of the original data unit, which is why we often use the standard deviation (square root of variance) to interpret dispersion in the original unit.

3

What is the relationship between standard deviation and variance?

Answer

The standard deviation is the square root of the variance. This transformation brings the dispersion measure back to the original data unit, making interpretation easier. For example, if data is in euros, variance will be in squared euros (hard to interpret), while standard deviation will be in euros. Standard deviation is therefore preferred for communicating data dispersion intuitively.

4

What does the first quartile (Q1) of a distribution represent?

5

How to interpret a Pearson correlation coefficient of -0.85?

+17 interview questions

Master Data Science & ML for your next interview

Access all questions, flashcards, technical tests, code review exercises and interview simulators.

Start for free