• Correcting a Bias in TIGER Rates Resulting from High Amounts of Invariant and Singleton Cognate Sets

    Author(s):
    Johann-Mattis List (see profile)
    Date:
    2021
    Group(s):
    Digital Humanists, History of Linguistics and Language Study, Linguistics, NLP for Ancient languages
    Subject(s):
    Computational linguistics, Historical linguistics
    Item Type:
    Article
    Tag(s):
    phylogenetics, phylogenetic reconstruction.
    Permanent URL:
    http://dx.doi.org/10.17613/0n1n-3352
    Abstract:
    In a recent issue of the Journal of Language Evolution, Syrjänen et al. (2021) investigate the suitability of computing Cummins and McInerney’s (2011) TIGER rates for estimating the tree-likeness of linguistic datasets compiled for phylogenetic reconstruction. The authors test the TIGER rates on a diverse sample of simulated data, which by and large confirms the usefulness of TIGER rates as an analytic tool for investigating linguistic data, but they test them only on one real-world dataset of Uralic languages which turns out to behave quite differently from the simulated data. When testing the TIGER rates on additional datasets, I detected a bias in the computation which leads to an unnatural increase in those cases where a dataset contains many characters with invariant or singleton states. To overcome this problem, I suggest a modified variant of TIGER rates, which is provided in the form of a freely available Python package. Testing the modified TIGER scores on the simulated data of Syrjänen et al. shows that the corrected TIGER rates still readily distinguish between different degrees of tree-likeness. Testing them on a dataset in which the number of singletons and invariants was artificially increased further shows that the corrected TIGER rates are not influenced by the bias. A final tests on seven linguistic datasets shows the usefulness of the corrected TIGER rates on a larger variety of linguistic datasets and illustrates the importance to take specific aspects of linguistic data into account when using biological methods in the domain of language evolution.
    Notes:
    Author final version of an article accepted for publication in the Journal of Language Evolution
    Metadata:
    Published as:
    Journal article    
    Status:
    Published
    Last Updated:
    1 year ago
    License:
    Attribution
    Share this:

    Downloads

    Item Name: pdf list-2021-corrected-tiger-rates.pdf
      Download View in browser
    Activity: Downloads: 67