How to scrape JavaScript webpages using Selenium in Python

Render JavaScript webpages by yourself

Lynn Kwong
5 min readFeb 5, 2022

--

Due to the increasing popularity of modern JavaScript frameworks such as React, Angular, and Vue, more and more websites are now built dynamically with JavaScript. This poses a challenge for web scraping because the HTML markup is not available in the source code. Therefore, we cannot scrape these JavaScript webpages directly and need to render them as regular HTML markup first.

In a previous post, we introduced how to scrape JavaScripe webpages with ProxyCrawl, a handy web service that can be used to help scrape JavaScript webpages. However, ProxyCrawl is not free to use and can be costly if a large number of JavaScript webpages need to be scraped frequently. In this post, we will introduce how to use Selenium to render JavaScript webpages. Selenium is an open-source library primarily used for automating web applications for testing purposes. However, in this post, we will not use it to automate frontend code testing, but just use it to render a JavaScript webpage as HTML markup which can then be used for web scraping.

The demo site to be used in this tutorial is http://quotes.toscrape.com/js/. If you open this website, right-click on the webpage and select “View page source”, you can only see some JavaScript code and not the HTML markup. Luckily for this site, the data is included in the <script> tag. However, for many websites, especially those created with Angular, there is little data in the JavaScript code and you must render it before you can scrape it. For example:

Before getting started, we need to install the packages needed for web scraping. It is recommended to create a virtual environment and install the packages there so they won’t mess up system libraries. For simplicity, we will use conda to create the virtual environment. To make it easier to run Python code interactively, we will also install iPython:

(base) $ conda create --name selenium python=3.10
(base) $ conda activate selenium
(selenium) $ pip install -U requests selenium lxml
(selenium) $ pip install ipython
(selenium)…

--

--

Lynn Kwong

I’m a Software Developer (https://superdataminer.com) keen on sharing thoughts, tutorials, and solutions for the best practice of software development.