• "Q i-jtb the Raven": Taking Dirty OCR Seriously

    Author(s):
    Ryan Cordell (see profile)
    Date:
    2017
    Group(s):
    Digital Humanities, LLC 19th-Century American, TM Bibliography and Scholarly Editing, TM Book History, Print Cultures, Lexicography
    Subject(s):
    Bibliography, Descriptive bibliography, Book history, Digital humanities, Media archaeology
    Item Type:
    Article
    Permanent URL:
    http://dx.doi.org/10.17613/M6WG2S
    Abstract:
    This article argues that scholars must understand mass digitized texts as assemblages of new editions, subsidiary editions, and impressions of their historical sources, and that these various parts require sustained bibliographic analysis and description. To adequately theorize any research conducted in large-scale text archives—including research that includes primary or secondary sources discovered through keyword search—we must avoid the myth of surrogacy proffered by page images and instead consider directly the text files they overlay. Focusing on the OCR (optical character recognition) from which most large-scale historical text data derives, this article argues that the results of this "automatic" process are in fact new editions of their source texts that offer unique insights into both the historical texts they remediate and the more recent era of their remediation. The constitution and provenance of digitized archives are, to some extent at least, knowable and describable. Just as details of type, ink, or paper, or paratext such as printer's records can help us establish the histories under which a printed book was created, details of format, interface, and even grant proposals can help us establish the histories of corpora created under conditions of mass digitization.
    Metadata:
    Published as:
    Journal article    
    Status:
    Published
    Last Updated:
    11 months ago
    License:
    Attribution-NonCommercial-ShareAlike
    Share this:

    Downloads

    Item Name:pdf 2017-bookhistory-qitjbtheraven.pdf
     Download View in browser
    Activity: Downloads: 147