23
Crawling a Website that loads content using Javascript with Selenium Webdriver in Python
Selenium
Crawler

Selenium is a browser automation tool that is used primarily for testing web applications. You can simulate real user actions and interactions with your web applications. Selenium supports all the major browser platforms and operating systems. There are bindings for all the popular programming languages. The power of Selenium is not just restricted to testing your web apps, one other use can be of crawling or scraping websites, in particular, the ones which don't provide an API and load content lazily using Javascript.

We will be crawling an online merchant website www.jabong.com with Selenium using its python bindings. Jabong loads more products as you scroll down a web page. We will simulate this user action of scrolling down a web page and then retrieve all the product titles and the corresponding links to the product detail pages.

Reference:

Platform: Ubuntu/Debian

Setup:

sudo pip install selenium

sudo pip install xvfb

sudo pip install pyvirtualdisplay

We use pyvirtualdisplay which is a wrapper around xvfb and enables you to run Firefix headlessly.

Page to Crawl:

enter image description here

A quick "Inspect Element" on a shoe above shows that each of the product is wrapped by a "div" element with class "hover-box" and the title and links are embedded in an "a" element within those "div" elements.

enter image description here

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from pyvirtualdisplay import Display

def correct_url(url): 
 if not url.startswith("http://") and not url.startswith("https://"):
  url = "http://" + url
 return url

def scrollDown(browser, numberOfScrollDowns):
 body = browser.find_element_by_tag_name("body")
 while numberOfScrollDowns >=0:
  body.send_keys(Keys.PAGE_DOWN)
  numberOfScrollDowns -= 1
 return browser

def crawl_url(url, run_headless=True):
 if run_headless:
  display = Display(visible=0, size=(1024, 768))
  display.start()

 url = correct_url(url)
 browser = webdriver.Firefox()
 browser.get(url)
 browser = scrollDown(browser, 10)

 all_hover_elements = browser.find_elements_by_class_name("hover-box")

  for hover_element in all_hover_elements:
  a_element = hover_element.find_element_by_tag_name("a")
  product_title = a_element.get_attribute("title")
  product_link = a_element.get_attribute("href")
  print product_title, product_link

 browser.quit()

if __name__=='__main__':
 url = "http://www.jabong.com/men/shoes/new-products/"
 crawl_url(url)

Here is the output of the above script:

javascript:sendSizeFormWithSize('http://www.jabong.com/Nike-Liteforce-Ii-Mid-White-Sneakers-503958.html','6')
Nike Ballista Iv Msl Grey Running Shoes http://www.jabong.com/Nike-Ballista-Iv-Msl-Grey-Running-Shoes-503907.html
U.S. Polo Assn. Navy Blue Sneakers http://www.jabong.com/Us_Polo_Assn-Navy-Blue-Sneakers-518870.html
Nike Dewired Navy Blue Sneakers http://www.jabong.com/Nike-Dewired-Navy-Blue-Sneakers-503897.html
 javascript:sendSizeFormWithSize('http://www.jabong.com/Us_Polo_Assn-Black-Dress-Shoes-518868.html','6')
 javascript:sendSizeFormWithSize('http://www.jabong.com/Us_Polo_Assn-Destiny-Navy-Blue-Sneakers-518878.html','6')
United Colors of Benetton Black Boat Shoes http://www.jabong.com/United-Colors-of-Benetton-Black-Boat-Shoes-512546.html
Phosphorus Brown Loafers http://www.jabong.com/phosphorus-Brown-Loafers-506097.html
 javascript:sendSizeFormWithSize('http://www.jabong.com/United-Colors-of-Benetton-Blue-Sneakers-512548.html','6')
U.S. Polo Assn. Delta Beige Sneakers http://www.jabong.com/Us_Polo_Assn-Delta-Beige-Sneakers-518873.html
Asics Kayano 20 Black Running Shoes http://www.jabong.com/Asics-Kayano-20-Black-Running-Shoes-507673.html
Nike Air Max 2014 Blue Running Shoes http://www.jabong.com/Nike-Air-Max-2014-Blue-Running-Shoes-503964.html
Asics Excel 33 3 Navy Blue Running Shoes http://www.jabong.com/Asics-Excel-33-3-Navy-Blue-Running-Shoes-507674.html
Nike Air Relentless 3 Msl Blue Running Shoes http://www.jabong.com/Nike-Air-Relentless-3-Msl-Blue-Running-Shoes-503954.html
Asics Kayano 20 Navy Blue Running Shoes http://www.jabong.com/Asics-Kayano-20-Navy-Blue-Running-Shoes-507672.html
Nike Lunarinternationalist Grey Running Shoes http://www.jabong.com/Nike-Lunarinternationalist-Grey-Running-Shoes-503975.html
Nike Free 5.0+ Blue Running Shoes http://www.jabong.com/Nike-Free-50-Blue-Running-Shoes-503919.html
Nike Fs Lite Run Black Running Shoes http://www.jabong.com/Nike-Fs-Lite-Run-Black-Running-Shoes-503955.html
Nike Lunarinternationalist Blue Running Shoes http://www.jabong.com/Nike-Lunarinternationalist-Blue-Running-Shoes-503976.html
Nike Free Trainer 5.0 Grey Running Shoes http://www.jabong.com/Nike-Free-Trainer-50-Grey-Running-Shoes-503886.html
 javascript:sendSizeFormWithSize('http://www.jabong.com/Nike-Zoom-Terra-Kiger-Blue-Running-Shoes-503930.html','6')
Andrew Hill Brown Dress Shoes http://www.jabong.com/andrew-hill-Brown-Dress-Shoes-506110.html
Nike Lunar Forever 3 Msl Grey Running Shoes http://www.jabong.com/Nike-Lunar-Forever-3-Msl-Grey-Running-Shoes-503970.html
Asics Kayano 20 Red Running Shoes http://www.jabong.com/Asics-Kayano-20-Red-Running-Shoes-507671.html
Nike Free Trainer 5.0 Black Running Shoes http://www.jabong.com/Nike-Free-Trainer-50-Black-Running-Shoes-503915.html
Nike Chroma Thong Iii Green Slippers http://www.jabong.com/Nike-Chroma-Thong-Iii-Green-Slippers-503977.html
Nike Aquahype Blue Flip Flops http://www.jabong.com/Nike-Aquahype-Blue-Flip-Flops-503890.html
Z Collection Green Loafers http://www.jabong.com/z-collection-Green-Loafers-517513.html
Nike Flex 2013 Rn Black Running Shoes http://www.jabong.com/Nike-Flex-2013-Rn-Black-Running-Shoes-503916.html
U.S. Polo Assn. Brown Sneakers http://www.jabong.com/Us_Polo_Assn-Brown-Sneakers-518869.html
Phosphorus Black Loafers http://www.jabong.com/phosphorus-Black-Loafers-506093.html
Phosphorus Black Loafers http://www.jabong.com/phosphorus-Black-Loafers-506102.html
Nike Eliminate Ii Leather Grey Sneakers http://www.jabong.com/Nike-Eliminate-Ii-Leather-Grey-Sneakers-503909.html
Nike Fs Lite Trainer Blue Running Shoes http://www.jabong.com/Nike-Fs-Lite-Trainer-Blue-Running-Shoes-503952.html
Nike Suketo 2 Leather Red Sneakers http://www.jabong.com/Nike-Suketo-2-Leather-Red-Sneakers-503972.html
Nike Free Flyknit+ Red Running Shoes http://www.jabong.com/Nike-Free-Flyknit-Red-Running-Shoes-469765.html
 javascript:sendSizeFormWithSize('http://www.jabong.com/Nike-Zoom-Wildhorse-Blue-Running-Shoes-503931.html','6')
Nike Air Pegasus+ 30 Grey Running Shoes http://www.jabong.com/Nike-Air-Pegasus-30-Grey-Running-Shoes-503934.html
Nike Flex Supreme Tr 2 Grey Running Shoes http://www.jabong.com/Nike-Flex-Supreme-Tr-2-Grey-Running-Shoes-503942.html
Phosphorus Black Loafers http://www.jabong.com/phosphorus-Black-Loafers-506088.html
Phosphorus Black Loafers http://www.jabong.com/phosphorus-Black-Loafers-506087.html
Phosphorus Black Loafers http://www.jabong.com/phosphorus-Black-Loafers-506091.html
Nike Zoom Structure+ 17 Black Running Shoes http://www.jabong.com/Nike-Zoom-Structure-17-Black-Running-Shoes-503949.html
Nike Flyknit Lunar2 Black Running Shoes http://www.jabong.com/Nike-Flyknit-Lunar2-Black-Running-Shoes-503959.html
Phosphorus Brown Loafers http://www.jabong.com/phosphorus-Brown-Loafers-506103.html
Andrew Hill Black Dress Shoes http://www.jabong.com/andrew-hill-Black-Dress-Shoes-506109.html
Phosphorus Tan Loafers http://www.jabong.com/phosphorus-Tan-Loafers-506095.html
Phosphorus Black Loafers http://www.jabong.com/phosphorus-Black-Loafers-506090.html
U.S. Polo Assn. Delta Navy Blue Sneakers http://www.jabong.com/Us_Polo_Assn-Delta-Navy-Blue-Sneakers-518872.html
Phosphorus Brown Loafers http://www.jabong.com/phosphorus-Brown-Loafers-506094.html
Nike Lunar Forever 3 Msl White Running Shoes http://www.jabong.com/Nike-Lunar-Forever-3-Msl-White-Running-Shoes-503971.html
Asics Kayano 20 White Running Shoes http://www.jabong.com/Asics-Kayano-20-White-Running-Shoes-507670.html
Author

Notifications

?