
Data Cleaning
Missing values, duplicates, outliers, business rules, transformation, data quality
1What is a missing value in a dataset?
What is a missing value in a dataset?
Answer
A missing value represents absent or unfilled data in a field. It can appear as an empty cell, NULL in a database, or NaN in a DataFrame. Identifying missing values is the first step in data cleaning because they can distort statistical analyses and aggregations.
2What is the difference between a NULL value and an empty string in a database?
What is the difference between a NULL value and an empty string in a database?
Answer
NULL means the value is unknown or does not exist, while an empty string is a known value that happens to be empty. This distinction is fundamental in SQL because NULL cannot be compared with the = operator (IS NULL must be used), whereas an empty string can be compared normally with = ''.
3What is a duplicate in a dataset?
What is a duplicate in a dataset?
Answer
A duplicate is a record that appears more than once in a dataset, either exactly (all columns identical) or partially (certain key columns identical). Duplicates distort counts, sums, and averages. Their detection typically relies on identifying key columns that should be unique.
Which technique allows detecting exact duplicates in SQL?
What is an outlier in a dataset?
+17 interview questions
Other Data Analytics interview topics
Google Sheets - Fundamentals
Google Sheets - Advanced Formulas
SQL - Fundamentals
SQL - Aggregations and Grouping
SQL - Joins
BigQuery - Fundamentals
KPIs and Business Metrics
Descriptive Statistics
Zapier and No-Code Automation
Data Visualization Principles
Python & Pandas - Fundamentals
Google Sheets - Automated Dashboards
SQL - Subqueries and CTEs
SQL - Window Functions
BigQuery - Advanced Features
Data Modeling
Funnel and Conversion Analysis
Cohort and Retention Analysis
Google Tag Manager and Tracking
APIs and Webhooks
dbt - Fundamentals
AB Testing and Applied Statistics
Looker Studio (Google Data Studio)
Power BI - Fundamentals
SQL - Advanced Analytical Queries
dbt - Advanced Features
Power BI - DAX and Advanced Dashboards
Python Analytics - Advanced Analysis and ML
Master Data Analytics for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free