DepositActive OCR: Tightening the Loop in Human Computing for OCR Correction

We propose a proof-of-concept application that will experiment with the use of active learning and other iterative techniques for the correction of eighteenth-century texts provided by the HathiTrust Digital Library and the 2,231 ECCO text transcriptions released into the public domain by Gale and distributed by the Text Creation Partnership (TCP) and 18thConnect. In an application based on active learning or a similar approach, the user could identify dozens or hundreds of difficult characters that appear in the articles from that same time period, and the system would use this new knowledge to improve optical character recognition (OCR) across the entire corpus. A portion of our efforts will focus on the need to incentivize engagement in tasks of this type, whether they are traditionally crowdsourced or through a more active, iterative process like the one we propose. We intend to examine how explorations of a users’ preferences can improve their engagement with corpora of materials.

MemberEmmanuel Mkpojiogu

Emmanuel O.C. Mkpojiogu holds a Bachelor of Science degree in Statistics and Computer Science with First Class Honors from the University of Nigeria. He also graduated from Universiti Utara Malaysia with a Master of Science degree in Information Technology with Distinction (majoring in Software Engineering, and Human-Computer Interaction). Presently, he lectures at Veritas University Abuja (The Catholic University of Nigeria), Abuja, Nigeria. He also has several academic articles in reputable international journals.

MemberGrace Afsari-Mamagani

I’m a project manager and learning experience designer pursuing a PhD in literature. I’m particularly interested in digital pedagogy and technology integration in the humanities in higher ed. Professionally, I’ve worked with learners in K-12 environments, as well as college and graduate students, to make concepts like data, networked devices, and digital surveillance accessible and actionable. My literary criticism focuses on contemporary literature, the urban environment, and embodiment as a means of theorizing human-computer interaction, “play,” and experiential learning.

DepositEvolutionary Subject Tagging in the Humanities

Interdisciplinary research in the humanities requires indexing that represents multiple disciplinary perspectives. Most literature has been indexed using traditional models for subject analysis that are either too broad to be helpful or represent a single disciplinary perspective. We question whether traditional print models of subject analysis serve humanistic researchers’ needs in working with digital content. It is beyond the capacity of libraries to re-index this body of literature relying on human indexers. We need to develop scalable tools to both re-index extant bodies of literature and newly created literature. Web-scale searching, computational text analysis, and automated indexing each hold promise for addressing various aspects of the problem, but none seem to fully address the problem. This project will gather a group of scholars with expertise in the humanities, computational analysis of texts, and library and information science, to design an approach to the problem.

DepositThe Crowded Page

The Crowded Page is an Internet-based humanities computing project whose goal is to create data-mining and visualization tools that will allow researchers to map out the intricate connections between the members of artistic and literary communities. In most accounts of literary and art history, a work of art or literature is said to be the product of a single creative mind. In an effort to make visible what is often obscured in traditional histories of art and literature, The Crowded Page seeks to take advantage of the unique capabilities of the digital medium to foreground the ways in which a complex network of friends, editors, neighbors, lovers, and fellow artists and writers informs the creative process.

DepositComputer Simulations in the Humanities

This project will advance research in the humanities by adding a variety of simulation techniques to the standard repertoire of methods already employed by humanists. Interested humanists from a range of disciplines including philosophy, history, archeology, linguistics, anthropology and political science, among others, will work not only with technical experts but also with humanists already familiar with methods involving computer simulations and models. Our aim in bringing technologists and humanists together in precisely this way is to promote the dual notion of “the humanities shaping technology” as well as “technology shaping the humanities.” Modeling experts will be pressed to not merely present existing techniques but to shape those techniques in ways that address questions and on-going inquiries pursued by humanists. Twenty-four humanists will spend 3 weeks in June 2011 and 3 days in 2012 interacting with modeling experts.