Data Mining and Vogue

Robots Reading Vogue

Robots Reading Vogue is a digital humanities project launched by Yale University library, exploring the intersection of two seemingly disparate research interests — fashion and data mining. Because Vogue has a century’s worth of text, images and a well-digitalized online archive, Peter Leonard, Yale’s new Librarian for Digital Humanities Research, together with Lindsay King, an arts librarian interested in fashion, initiated a series of digital humanities researches using Vogue Archive. Below are some of the exemplary projects from Robots Reading Vogue.

Robots Reading Vogue Projects

One specific project that I am entranced by is n-gram Search. It is a text mining algorithm that searches and compares word usage in 400,000+ pages within ads, articles, or all texts. For example, the word usage of “beauty”, “cosmetic”, and “fashion” have the following trends over the years. Scholars such as Cindy Craig and Colleen Seale have used this tool for their women’s and gender studies research, Striking Gold: Using Free Text Mining Tools to Explore Women’s and Gender Studies Literature.

n-gram Search


All of the data for projects in Robots Reading Vogue are gathered from the Vogue Archive, including the corpus used in n-gram search. The Vogue Archive was launched by ProQuest and Condé Nast in 2011, containing the entire run of Vogue magazine (US edition) from 1892 to the present day, which is more than 400,000 pages! Yale University Library has purchased a perpetual access license, which conferred rights to the use of raw files for scholarly research.


It is unclear what exact codes Lindsay King and Peter Leonard used to create the n-gram Search project, but it is certain that a n-gram search algorithm is used to look for desired text in the corpus efficiently. The website shows that one can also adjust the date limit, metrics, and case sensitivity of the word search, so there are definitely corresponding codes in the algorithm that can accommodate these features.


Bookworm is used to present the n-gram Search result. It was a data visualization tool created by Benjamin Schmidt (Department of History, Northeastern), Matt Nicklay, Neva Cherniavsky Durand, Martin Camacho, and Erez Lieberman Aiden at the Cultural Observatory. This tool enables users to explore and interact with lexical trends conveniently.

I am genuinely very curious what exact code are used to create the n-gram Search project. I would love to learn how to deploy n-gram search algorithm in application and visualize the result with an interactive platform. Unfortunately, I could not find the code of this project. It is probably not open source.


One Comment

  1. I think it’s really interesting how these different digital humanities projects vary about how open sourced they are. In the project that I explored in, I had much more detailed publications to read and the scholars posted code for almost every source. perhaps the target audiences are different, so the openness of the project is also different.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.