
Transformers & Attention
Attention mechanism, self-attention, multi-head attention, Transformer architecture, positional encoding
1What is the main advantage of the attention mechanism over RNNs for sequence processing?
What is the main advantage of the attention mechanism over RNNs for sequence processing?
Answer
The attention mechanism allows direct access to any position in the sequence, eliminating the sequential bottleneck of RNNs. Unlike RNNs that must propagate information step by step, attention computes direct connections between all positions, enabling massive parallelization and capturing long-range dependencies without gradient degradation.
2In the attention mechanism, what do the Query (Q), Key (K) and Value (V) vectors represent?
In the attention mechanism, what do the Query (Q), Key (K) and Value (V) vectors represent?
Answer
Query represents what the token is looking for, Key represents what each token can offer as a match, and Value contains the information to retrieve. The attention score is computed between Q and K to determine relative importance, then used to weight the V. This analogy comes from information retrieval systems where a query is compared to keys to retrieve values.
3What is the formula for scaled dot-product attention and why divide by the square root of dk?
What is the formula for scaled dot-product attention and why divide by the square root of dk?
Answer
The formula is Attention(Q,K,V) = softmax(QK^T / sqrt(dk)) * V. Dividing by sqrt(dk) is crucial because dot products of high-dimensional vectors tend to have large magnitudes, pushing softmax into regions with very small gradients. This normalization maintains stable variance of attention scores, ensuring efficient learning.
What is the fundamental difference between attention and self-attention?
Why use multi-head attention rather than a single attention head?
+21 interview questions
Other Data Science & ML interview topics
Python Basics
Python Object-Oriented Programming
Python Data Structures
Git Fundamentals
SQL Basics
NumPy Fundamentals
Pandas Basics
Jupyter & Google Colab
SQL Joins & Advanced Queries
Advanced Pandas
Visualization with Matplotlib & Seaborn
Interactive Visualizations with Plotly
Descriptive Statistics
Inferential Statistics
Web Scraping
BigQuery & Cloud Data
Feature Engineering
Supervised ML: Regression
Supervised ML: Classification
Decision Trees & Ensembles
Unsupervised ML
ML Pipelines & Validation
Time Series & Forecasting
Deep Learning Fundamentals
TensorFlow & Keras
CNN & Image Classification
RNN & Sequences
NLP & Hugging Face
GenAI & LangChain
MLOps & Deployment
Master Data Science & ML for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free