Data Science & ML

Advanced Pandas

GroupBy, merge, concat, pivot tables, time series, apply/transform, MultiIndex, performance

24 interview questionsยท
Mid-Level
1

Which method allows applying multiple different aggregation functions to a single column with groupby?

Answer

The agg() (or aggregate()) method allows applying multiple aggregation functions to the same columns. You can pass a list of functions like ['sum', 'mean', 'count'] or a dictionary to specify different functions per column. This flexibility is essential for creating comprehensive statistical reports in a single operation.

2

How to explicitly name the resulting columns during a groupby aggregation using named aggregation syntax?

Answer

Named aggregation syntax uses agg() with named tuples via keyword arguments. For example: df.groupby('category').agg(total_sales=('sales', 'sum'), avg_price=('price', 'mean')). This approach produces explicit and readable column names, avoiding MultiIndex in columns which can complicate subsequent processing.

3

What is the main difference between transform() and apply() in a groupby context?

Answer

transform() returns a result of the same size as the input, aligned to the original index, ideal for adding group statistics to each row (e.g., group mean). apply() is more flexible and can return a different-sized result, but is generally slower. Use transform() for operations like group normalization or z-score calculation.

4

How to filter groups in a groupby to keep only those that satisfy a condition (for example, groups with more than 10 elements)?

5

What is the difference between pd.merge() with how='left' and how='inner'?

+21 interview questions

Master Data Science & ML for your next interview

Access all questions, flashcards, technical tests, code review exercises and interview simulators.

Start for free