Data Science & ML

Web Scraping

BeautifulSoup, requests, HTML parsing, XPath, CSS selectors, APIs, pagination, best practices

18 interview questionsยท
Mid-Level
1

Which Python library is typically used to make HTTP requests before parsing HTML content?

Answer

The requests library is the standard in Python for making HTTP requests in a simple and intuitive way. It allows performing GET, POST and other HTTP verbs with a clear API. BeautifulSoup does not make HTTP requests, it only parses HTML once retrieved.

2

What is the main role of BeautifulSoup in a web scraping project?

Answer

BeautifulSoup is an HTML/XML parsing library that allows navigating, searching and extracting data from an HTML document. It creates a document tree that makes it easy to search for elements using methods like find() and find_all(). It does not make HTTP requests.

3

Which BeautifulSoup method finds all elements matching a given criteria?

Answer

The find_all() method returns a list of all elements matching the specified criteria (tag, attributes, class, etc.). The find() method only returns the first matching element. select() uses CSS selectors and select_one() returns a single element with a CSS selector.

4

How to specify a custom User-Agent header when making a request with requests?

5

Which attribute of the Response object returns the HTML content as text?

+15 interview questions

Master Data Science & ML for your next interview

Access all questions, flashcards, technical tests, code review exercises and interview simulators.

Start for free