Scraping with R: SelectorGadget

Introduction: This is a tool that allows one to scrape useful information off HTML webpages using R. SelectorGadget allows one to interactively select and deselect elements of a webpage and then use that selector to scrape them into a usable data format. This was useful when my group found a huge list of clubs / organizations at Carleton and wanted to get them into a csv file; this allowed us to save a bunch of time typing.

Steps:

  1. Download SelectorGadget from the Chrome Web Store.
  2. Open the webpage you would like to scrape.
  3. Click on the SelectorGadget button from the top right.
  4. Click on the elements you would like to select. Click again on the ones you want to exclude.

5. Copy the generated selector and use R and the library “rvest” to get the text. In the below screenshot, the selector is “h2”.

Kevin

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.