Álvaro Cuéllar deposited Artificial Intelligence to the Rescue of the Spanish Golden Age: Automatic Transcription and Modernization of One Thousand Three Hundred Theatrical Prints and Manuscripts in the group Spanish Golden Age Literature on Humanities Commons 4 months, 1 week ago
A high percentage of theatrical prints and manuscripts from the aurisecular period have never been transcribed in an analogical or, of course, digital format. It is therefore impossible to use these documents to carry out searches of our interest or for the valuable computer analyses (stylometry, topic modelling, sentiment analysis, etc.) that have been developed in recent years. Thanks to Artificial Intelligence (Transkribus) and HTR (Handwritten Text Recognition) techniques, I have trained three models, already public for the research community, capable of transcribing and orthographically modernizing these documents automatically with a high degree of precision: around 97% of success in prints and 91% in manuscripts. Through these models I have been able to process some 1,300 theatrical plays contained in prints and manuscripts from numerous libraries, archives, and other digitized sources. The resulting transcripts are now part of the ETSO project, of the TEXORO search engine and, in addition to being an advanced starting point for careful editing of the texts, they themselves have sufficient quality to be subjected to stylometric analysis, which is yielding authorship attributions of interest.