Before getting started, we need to install the packages needed for web scraping. It is recommended to create a virtual environment and install the packages there so they won’t mess up system libraries. For simplicity, we will use conda to create the virtual environment. To make it easier to run Python code interactively, we will also install iPython:
(base) $ conda create --name js_scrape python=3.10
(base) $ conda activate js_scrape
(js_scrape) $ pip install -U requests lxml
(js_scrape) $ pip install ipython
(js_scrape) $ ipython
- requests — Used to download the webpage content.
- lxml — Used to scrape the rendered HTML markup using XPath.
Let’s first try to explore the ProxyCrawl Crawling API a bit.