Abstract

The medieval civilisation of Europe can only be investigated indirectly, by means of the diligent study of numerous traces that have survived to our times. The best source of our knowledge is still numerous surviving texts, due both to their huge quantity and surprising variety. Written mainly in Medieval Latin, within a social context that had nothing in common either with ancient or our own times, they have not benefited as they deserve from recent advances in computational linguistics or text mining. This is due, among other things, to the generally poor quality of existing resources, inadequate design of user search interfaces and unsatisfactory application of Natural Language Processing and Digital Humanities methods to the study of ancient texts. To challenge this situation we propose to build a large, representative and balanced corpus of Medieval Latin texts composed between 500 and 1500 AD all across Europe. The corpus will be annotated with PoS, lemma, time and place labels and enriched by linking it closely to a collection of dictionaries and encyclopaedias. For both textual and lexicographical resources, tools allowing efficient statistical analysis and data visualisation will be developed, aimed at revealing cultural and societal patterns that are still to be discovered from the Latin words.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.