• Using Dimensionality Reduction and Tag Parameter Spaces to Study Historical Change in a Large Document Archive

    Tim Hitchcock, William J Turkel (see profile)
    CSDH-SCHN 2021: Making the Network
    History, Databases, Machine learning, Crime, Punishment
    Item Type:
    Meeting Title:
    CSDH/SCHN Conference 2021
    Meeting Org.:
    Canadian Society for Digital Humanities (CSDH/SCHN)
    Meeting Loc.:
    Remote, hosted from Edmonton, AB
    Meeting Date:
    May 30 – June 3, 2021
    Old Bailey Online, Text linguistics, Historical databases, Representation, Crime and punishment
    Permanent URL:
    In this presentation we discuss one approach to studying historical change in a large document archive, The Old Bailey Proceedings Online. In addition to the texts themselves, we are working with two kinds of representation. The first is a set of XML tags that were added to the trial accounts when the digital archive was created. Since these tags were drawn from small finite sets, we can think of them as dimensions that can be used to categorize each trial in a tag parameter space. The second is a dimension reduction technique, Stable Random Projections (Schmidt 2018). Each SRP is a small sketch, or fingerprint, of a given trial, and each trial can be located in a space of SRPs. We are using SRPs in conjunction with the parameter space created by the XML tags to assess the representativeness of trials in particular periods of time and to identify outliers and anomalies. As Schmidt showed in his own examples, clusters in SRP space occur at a variety of scales, and can often be mapped onto classifications that are meaningful to human observers (e.g., as represented by the XML tags).
    Last Updated:
    3 years ago
    All Rights Reserved


    Item Name: pdf hitchcock-turkel-csdh-2021.pdf
      Download View in browser
    Activity: Downloads: 89