Question 1

What is the main advantage of the attention mechanism over RNNs for sequence processing?

Accepted Answer

The attention mechanism allows direct access to any position in the sequence, eliminating the sequential bottleneck of RNNs. Unlike RNNs that must propagate information step by step, attention computes direct connections between all positions, enabling massive parallelization and capturing long-range dependencies without gradient degradation.

Question 2

In the attention mechanism, what do the Query (Q), Key (K) and Value (V) vectors represent?

Accepted Answer

Query represents what the token is looking for, Key represents what each token can offer as a match, and Value contains the information to retrieve. The attention score is computed between Q and K to determine relative importance, then used to weight the V. This analogy comes from information retrieval systems where a query is compared to keys to retrieve values.

Question 3

What is the formula for scaled dot-product attention and why divide by the square root of dk?

Accepted Answer

The formula is Attention(Q,K,V) = softmax(QK^T / sqrt(dk)) * V. Dividing by sqrt(dk) is crucial because dot products of high-dimensional vectors tend to have large magnitudes, pushing softmax into regions with very small gradients. This normalization maintains stable variance of attention scores, ensuring efficient learning.

Transformers & Attention

What is the main advantage of the attention mechanism over RNNs for sequence processing?

Answer

In the attention mechanism, what do the Query (Q), Key (K) and Value (V) vectors represent?

Answer

What is the formula for scaled dot-product attention and why divide by the square root of dk?

Answer

What is the fundamental difference between attention and self-attention?

Why use multi-head attention rather than a single attention head?

Other Data Science & ML interview topics

Python Basics

Python Object-Oriented Programming

Python Data Structures

Git Fundamentals

SQL Basics

NumPy Fundamentals

Pandas Basics

Jupyter & Google Colab

SQL Joins & Advanced Queries

Advanced Pandas

Visualization with Matplotlib & Seaborn

Interactive Visualizations with Plotly

Descriptive Statistics

Inferential Statistics

Web Scraping

BigQuery & Cloud Data

Feature Engineering

Supervised ML: Regression

Supervised ML: Classification

Decision Trees & Ensembles

Unsupervised ML

ML Pipelines & Validation

Time Series & Forecasting

Deep Learning Fundamentals

TensorFlow & Keras

CNN & Image Classification

RNN & Sequences

NLP & Hugging Face

GenAI & LangChain

MLOps & Deployment

Master Data Science & ML for your next interview