Researchers interested in text mining and corpus analysis may download a zip file of the documents that comprise the Phase 1 Repository. These documents have been OCRd and encoded as UTF-8. They can be uploaded as a single zip file to Voyant Tools, for basic text mining and visualization processes. or used in other analytics tools such as AntConc. We recommend users apply this list of suggested stop words — words that should not be included for the purposes of word counts and frequencies. Note that proper pronouns are not part of the stop words list.
Special thanks to Stewart Varner, Digital Scholarship Librarian at the University of North Carolina at Chapel Hill, for his assistance in processing the corpus data.