
NLP & Hugging Face
Tokenization, embeddings, BERT, GPT, Hugging Face Transformers, fine-tuning, pipelines, inference
1What is the main function of tokenization in natural language processing?
What is the main function of tokenization in natural language processing?
Answer
Tokenization splits raw text into smaller units called tokens, which can be words, subwords, or characters. This step is essential because language models cannot directly process raw text. Each token is then converted to a numerical identifier that the model can process.
2What is the main advantage of the BPE (Byte Pair Encoding) algorithm over word-level tokenization?
What is the main advantage of the BPE (Byte Pair Encoding) algorithm over word-level tokenization?
Answer
BPE handles unknown words (out-of-vocabulary) by decomposing them into known subunits. Unlike word-level tokenization that replaces unknown words with a special [UNK] token, BPE can represent any word as a combination of subwords present in the vocabulary, enabling generalization to words never seen during training.
3What is the fundamental difference between WordPiece and BPE for vocabulary construction?
What is the fundamental difference between WordPiece and BPE for vocabulary construction?
Answer
BPE merges the most frequent token pairs, while WordPiece chooses merges that maximize the likelihood of the training corpus. WordPiece thus uses a probabilistic criterion rather than pure frequency, which can produce slightly different splits potentially better suited to the final language model.
What is the main difference between static word embeddings (Word2Vec) and contextual embeddings (BERT)?
What are the two pre-training tasks used by BERT?
+21 interview questions
Other Data Science & ML interview topics
Python Basics
Python Object-Oriented Programming
Python Data Structures
Git Fundamentals
SQL Basics
NumPy Fundamentals
Pandas Basics
Jupyter & Google Colab
SQL Joins & Advanced Queries
Advanced Pandas
Visualization with Matplotlib & Seaborn
Interactive Visualizations with Plotly
Descriptive Statistics
Inferential Statistics
Web Scraping
BigQuery & Cloud Data
Feature Engineering
Supervised ML: Regression
Supervised ML: Classification
Decision Trees & Ensembles
Unsupervised ML
ML Pipelines & Validation
Time Series & Forecasting
Deep Learning Fundamentals
TensorFlow & Keras
CNN & Image Classification
RNN & Sequences
Transformers & Attention
GenAI & LangChain
MLOps & Deployment
Master Data Science & ML for your next interview
Access all questions, flashcards, technical tests, code review exercises and interview simulators.
Start for free