We propose a proof-of-concept application that will experiment with the use of active learning and other iterative techniques for the correction of eighteenth-century texts provided by the HathiTrust Digital Library and the 2,231 ECCO text transcriptions released into the public domain by Gale and distributed by the Text Creation Partnership (TCP) and 18thConnect. In an application based on active learning or a similar approach, the user could identify dozens or hundreds of difficult characters that appear in the articles from that same time period, and the system would use this new knowledge to improve optical character recognition (OCR) across the entire corpus. A portion of our efforts will focus on the need to incentivize engagement in tasks of this type, whether they are traditionally crowdsourced or through a more active, iterative process like the one we propose. We intend to examine how explorations of a users’ preferences can improve their engagement with corpora of materials.
Fifty years of software development experience. Participated in a number of Humanities computing projects.
humanities computing, social media, popular romance fiction, readership, authorship, creative writing, libraries, archives
Interdisciplinary research in the humanities requires indexing that represents multiple disciplinary perspectives. Most literature has been indexed using traditional models for subject analysis that are either too broad to be helpful or represent a single disciplinary perspective. We question whether traditional print models of subject analysis serve humanistic researchers’ needs in working with digital content. It is beyond the capacity of libraries to re-index this body of literature relying on human indexers. We need to develop scalable tools to both re-index extant bodies of literature and newly created literature. Web-scale searching, computational text analysis, and automated indexing each hold promise for addressing various aspects of the problem, but none seem to fully address the problem. This project will gather a group of scholars with expertise in the humanities, computational analysis of texts, and library and information science, to design an approach to the problem.
The Crowded Page is an Internet-based humanities computing project whose goal is to create data-mining and visualization tools that will allow researchers to map out the intricate connections between the members of artistic and literary communities. In most accounts of literary and art history, a work of art or literature is said to be the product of a single creative mind. In an effort to make visible what is often obscured in traditional histories of art and literature, The Crowded Page seeks to take advantage of the unique capabilities of the digital medium to foreground the ways in which a complex network of friends, editors, neighbors, lovers, and fellow artists and writers informs the creative process.
This project will advance research in the humanities by adding a variety of simulation techniques to the standard repertoire of methods already employed by humanists. Interested humanists from a range of disciplines including philosophy, history, archeology, linguistics, anthropology and political science, among others, will work not only with technical experts but also with humanists already familiar with methods involving computer simulations and models. Our aim in bringing technologists and humanists together in precisely this way is to promote the dual notion of “the humanities shaping technology” as well as “technology shaping the humanities.” Modeling experts will be pressed to not merely present existing techniques but to shape those techniques in ways that address questions and on-going inquiries pursued by humanists. Twenty-four humanists will spend 3 weeks in June 2011 and 3 days in 2012 interacting with modeling experts.
The Center for Digital Humanities (CDH) at the University of South Carolina will partner with the Institute for Computing in Humanities, Arts, and Social Science (I-CHASS) at the University of Illinois at Urbana-Champaign and the National Center for Supercomputing Applications (NCSA) to foster innovation in the research and development of computational resources for humanities research groups. Humanities High Performance Computing Collaboratory (HpC) will engage scholars in a year-long collaboration with computing specialists in order to: 1) receive a comprehensive education in four computational concentrations; 2) receive instruction in digital humanities project design and management; 3) obtain hands-on experience with a variety of technical platforms; 4) work with technical staff to outline pilot explorations in at least one area of computational concentration; and 5) join a year-long virtual community where scholars will support their peers in authoring digital humanities projects. Participants will come from a wide range of institutions, with a particular focus on recruiting students and faculty from Historically Black Colleges and Universities and Tribal Colleges.
This project represents a multi-organizational, interdisciplinary effort to enhance collaborative research in cultural heritage fields by exploring user experience with Web-based technologies. The objective of this project is to document user needs around online systems for sharing primary data and documentation of cultural heritage collections. To this end, we will draw upon the experience and insights of representatives from different stakeholder groups in three broad arenas: academic researchers, heritage managers, and specialist communities. Investigations undertaken in this study will result in best-practice guidelines to guide humanities computing efforts on how to best meet the diverse user needs in future online data sharing systems. Using an iterative cycle of development, deployment, and evaluation, this project will enhance Open Context, a collaborative, open-access data sharing system already in use for archaeology and related disciplines.