Simple Web Scraping Using requests, Beautiful Soup, and lxml in Python

Get started with web scraping in Python

Lynn G. Kwong
9 min readFeb 4, 2022

When it comes to web scraping in Python, the first package we think about is Scrapy. However, Scrapy is more suitable for larger scraping projects. Besides, there is a learning curve for Scrapy, which takes time. For simple scraping issues where you only need to get data from a single webpage directly, it can be rather overkilled to use Scrapy. In this case, we can use the requests plus Beautiful Soup/lxml packages to scrape the content you need very quickly.

Image by Clker-Free-Vector-Images on Pixabay.

Before we get started, we need to install the packages needed for simple web scraping. The requests library will be used to download the webpage content. And if you prefer a Pythonic way of extracting data from a webpage using properties and methods of constructed classes, you can install and use the Beautiful Soup package. Beautiful Soup also supports CSS selectors that are useful for complex and nested elements. However, Beautiful Soup does not support XPath. Therefore, if you are more used to using XPath, you should use the lxml package instead.

It is recommended to create a virtual environment and install the packages there so they won’t mess up system libraries. For simplicity, we will use conda to create the virtual environment. To make it easier to…

--

--

Lynn G. Kwong

I’m a Software Developer (https://medium.com/@lynn-kwong) keen on sharing thoughts, tutorials, and solutions for the best practice of software development.