MemberChristof Schöch

Professor of Digital Humanities and member of the Trier Center for Digital Humanities at University of Trier, Germany. Also, mentor of the early-career research group Computational Literary Genre Stylistics (CLiGS) at University of Würzburg and Chair of the COST Action Distant Reading for European Literary History. Christof’s interests in research and teaching are located at the confluence of French literary studies and the digital humanities, especially digital editing and quantitative text analysis. He is also interested in new forms of scholarly publishing and collaboration and pleads for Open Access to publications and research data. He is an active member of the Romance Studies and Digital Humanities communities.

DepositTrading Consequences: A Case Study of Combining Text Mining and Visualization to Facilitate Document Exploration

Large-scale digitization efforts and the availability of computational methods, including text mining and information visualization, have enabled new approaches to historical research. However, we lack case studies of how these methods can be applied in practice and what their potential impact may be. Trading Consequences is an interdisciplinary research project between environmental historians, computational linguists, and visualization specialists. It combines text mining and information visualization alongside traditional research methods in environmental history to explore commodity trade in the 19th century from a global perspective. Along with a unique data corpus, this project developed three visual interfaces to enable the exploration and analysis of four historical document collections, consisting of approximately 200,000 documents and 11 million pages related to commodity trading. In this article, we discuss the potential and limitations of our approach based on feedback from historians we elicited over the course of this project. Informing the design of such tools in the larger context of digital humanities projects, our findings show that visualization-based interfaces are a valuable starting point to large-scale explorations in historical research. Besides providing multiple visual perspectives on the document collection to highlight general patterns, it is important to provide a context in which these patterns occur and offer analytical tools for more in-depth investigations.

DepositWorking with Text in a Digital Age

This Institute will provide 30 participants with three weeks in which (1) to develop hands on experience with TEI-XML, (2) to apply methods from information retrieval, text visualization, and corpus and computational linguistics to the analysis of textual and linguistic sources in the Humanities, and (3) to rethink not only their own research agendas but also new relationships between their work and non-specialists (e.g., an expansion in opportunities for tangible contributions and significant research by undergraduates, new collaborations that transcend boundaries of language and culture, and increased opportunities for the general public both to contribute to our understanding of the past). A two-day conference on the theme of the Institute will then follow in the summer of 2013 with an open call for contributions and will provide both a venue for and a challenge to the issues/ideas raised during the initial Institute and their importance for the digital humanities.

DepositZur Materialität der historischen Quellen im Zeitalter der digitalen Edition

Preprint, to be published in: Historische Editionen im digitalen Zeitalter. Les éditions historiques à l’ère numérique : Bestandesaufnahme und Ausblick. État des lieux et perspectives, hg. v. Pascale Sutter u. Sacha Zala, Basel (Schwabe) The essay discusses the consequence of digital methods in scholarly editing of historical sources. It comes to the following conclusions: Documents cannot be studied without taking the material features into account. Digital methods enable the editors to document those features relevant for the critical analysis of the source. The physical text bearing document is unique. It can never be reproduced but only referenced. Images, verbal descriptions, transcription and even more sophisticated reproduction techniques are only selective. In the digital edition the International Resource Identifier (IRI) of the semantic web is the best way to represent the original. Verbal descriptions have their own right against digital images and analysis methods of material science. Images are the cheapest way of editing, as they convey much information although not accessible for people lacking the necessary palaeographical skills. But computers can extract information from images too. Verbal description needs controlled vocabularies to create machine readable versions of the human readable editions.

MemberAlberto Campagnolo

Alberto Campagnolo trained as a book conservator (in Spoleto, Italy) and has worked in that capacity in various institutions, e.g. London Metropolitan Archives, St. Catherine’s Monastery (Egypt), and the Vatican Library. He studied Conservation of Library Materials at Ca’ Foscari University Venice, and holds an MA in Digital Culture and Technology from King’s College London. He pursued a PhD on an automated visualization of historical bookbinding structures at the Ligatus Research Centre (University of the Arts, London). He was a CLIR Postdoctoral Fellow (2016-2018) in Data Curation for Medieval Studies at the Library of Congress (Washington, DC). Alberto, in collaboration with Dot Porter (SIMS, UPenn Libraries, Philadelphia, PA), has been involved from the onset in the development of VisColl, a model and tool for the recording and visualization of the gathering structure of books in codex format. Alberto has served on the Digital Medievalist board since 2014, first as Deputy Director, and as Director since 2015, and has been in the Editorial Board of the Journal of Paper Conservation since 2016.

MemberMolly Des Jardin

Molly is the Japanese Studies Librarian and liaison for Korean Studies at University of Pennsylvania Libraries, and Adjunct Assistant Professor in Penn’s East Asian Languages & Civilizations department. In addition to her work as a librarian, she taught the seminar East Asian Digital Humanities (EALC111/511) (living work-in-progress syllabus PDF at at Penn in Spring 2018. In 2014, along with Katie Rawson, Molly co-founded WORD LAB, the Penn Libraries text analysis learning community, still going strong after many years. Molly is a historian of the book in modern Japan, ranging from Meiji (1868-1912) publishing to 21st-century urban exploration publications, and has a particular focus on theories and practices of authorship. Her article “Inventing Saikaku: Collectors, Provenance, and the Social Creation of an Author” appeared in Book History v.20 (2017) and she has co-authored two book chapters with Michael P. Williams (in ACRL’s 2019 The Globalized Library and an upcoming ACTLS monograph on graphic novels in libraries).

DepositAttributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm

This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regards to HTR, this research demonstrates that even though automated transcription significantly increases the risk of text misclassification when compared to OCR, a cleanliness above ≈ 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution.

MemberFolgert Karsdorp

I am a tenure-track researcher at the Meertens Institute of the Royal Netherlands Academy of Arts and Sciences. My research is interdisciplinary, adopting computational methods to study the field of humanities, in particular folkloristics. My research interests lie in the development of computational text analysis methods in the context of ethnology, anthropology, literary theory and cultural evolution (see my résumé for further details). Drop me a line or follow me on Twitter or GitHub.