DIY Textual Analysis

This screenshot suggests some of the ways in which the corpus can be analyzed at scale using freely available tools such as Voyant.

Researchers interested in text mining and corpus analysis may download a zip file of the documents that comprise the Phase 1 Repository. These documents have been OCRd and encoded as UTF-8. They can be uploaded as a single zip file to Voyant Tools, for basic text mining and visualization processes. or used in other analytics tools such as AntConc. We recommend users apply this list of suggested stop words — words that should not be included for the purposes of word counts and frequencies. Note that proper pronouns are not part of the stop words list.

Download the Corpus Data

Explore Preliminary Linguistic Analysis Visualizations

Special thanks to Stewart Varner, Digital Scholarship Librarian at the University of North Carolina at Chapel Hill, for his assistance in processing the corpus data.