• Machine Reading the Primeros Libros

    Author(s):
    Hannah Alpert-Abrams (see profile)
    Date:
    2016
    Subject(s):
    Colonial Latin American literature and culture, Digital humanities, Early modern studies
    Item Type:
    Article
    Tag(s):
    machine learning, algorithms, new spain, mexico
    Permanent URL:
    http://dx.doi.org/10.17613/M6SC9G
    Abstract:
    Early modern printed books pose particular challenges for automatic transcription: uneven inking, irregular orthographies, radically multilingual texts. As a result, modern efforts to transcribe these documents tend to produce the textual gibberish commonly known as "dirty OCR" (Optical Character Recognition). This noisy output is most frequently seen as a barrier to access for scholars interested in the computational analysis or digital display of transcribed documents. This article, however, proposes that a closer analysis of dirty OCR can reveal both historical and cultural factors at play in the practice of automatic transcription. To make this argument, it focuses on tools developed for the automatic transcription of the Primeros Libros collection of sixteenth century Mexican printed books. By bringing together the history of the collection with that of the OCR tool, it illustrates how the colonial history of these documents is embedded in, and transformed by, the statistical models used for automatic transcription. It argues that automatic transcription, itself a mechanical and practical tool, also has an interpretive effect on transcribed texts that can have practical consequences for scholarly work.
    Metadata:
    Published as:
    Journal article    
    Status:
    Published
    Last Updated:
    5 years ago
    License:
    Attribution-NonCommercial-NoDerivatives
    Share this:

    Downloads

    Item Name:pdf dhq_-digital-humanities-quarterly_-machine-reading-the-primeros-libros.pdf
     Download View in browser
    Activity: Downloads: 274