Literary Studies Meet Corpus Linguistics

Marin Laak,Kaarel Veskis,Olga Gerassimenko,Neeme Kahusk,Kadri Vider

doi:10.5617/dhnbpub.11103

Abstract

Digitalisation of cultural heritage in Estonia has been in progress during recent years, and we will see expansive mass digitalisation of printed books and handwritten documents in the very near future. This situation and potential actualises questions of the usage of the literary heritage. In our paper, we consider benefits for digital literary research that arise from representing a literary text collection as an annotated language resource. We discuss the pilot project of creating a text corpus based on private letters between two Estonian avantgarde writers in the beginning of the 20th century. The advantages and possibilities of corpus query system KORP that we have chosen for representing and searching literary heritage DH corpora as a language resource are described. Challenges that the application of Natural Language Processing and Text and Data Mining imposes on the preparation and representation of texts are discussed along with benefits for the research.

Full Text