
Descriptive Statistics
Mean, median, standard deviation, quartiles, distributions, correlations, outliers, skewness
1Which measure of central tendency is most appropriate for data containing extreme values (outliers)?
Which measure of central tendency is most appropriate for data containing extreme values (outliers)?
Answer
The median is the most robust measure of central tendency against outliers because it represents the middle value of sorted data, unaffected by extreme values. Unlike the mean which sums all values, the median only considers position. For example, for company salaries with a few highly paid executives, the median gives a better representation of the typical salary than the mean.
2What is the variance of a dataset?
What is the variance of a dataset?
Answer
Variance measures the dispersion of data around their mean. It is calculated as the average of the squared deviations from the mean. By squaring, we get always positive values and amplify the impact of values far from the mean. The variance unit is the square of the original data unit, which is why we often use the standard deviation (square root of variance) to interpret dispersion in the original unit.
3What is the relationship between standard deviation and variance?
What is the relationship between standard deviation and variance?
Answer
The standard deviation is the square root of the variance. This transformation brings the dispersion measure back to the original data unit, making interpretation easier. For example, if data is in euros, variance will be in squared euros (hard to interpret), while standard deviation will be in euros. Standard deviation is therefore preferred for communicating data dispersion intuitively.
What does the first quartile (Q1) of a distribution represent?
How to interpret a Pearson correlation coefficient of -0.85?
+17 interview questions
Other Data Science & ML interview topics
Python Basics
Python Object-Oriented Programming
Python Data Structures
Git Fundamentals
SQL Basics
NumPy Fundamentals
Pandas Basics
Jupyter & Google Colab
SQL Joins & Advanced Queries
Advanced Pandas
Visualization with Matplotlib & Seaborn
Interactive Visualizations with Plotly
Inferential Statistics
Web Scraping
BigQuery & Cloud Data
Feature Engineering
Supervised ML: Regression
Supervised ML: Classification
Decision Trees & Ensembles
Unsupervised ML
ML Pipelines & Validation
Time Series & Forecasting
Deep Learning Fundamentals
TensorFlow & Keras
CNN & Image Classification
RNN & Sequences
Transformers & Attention
NLP & Hugging Face
GenAI & LangChain
MLOps & Deployment
Master Data Science & ML for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free