Abstract

Research in the area of ​​the interconnection of lexical resources represents a real challenge, because it addresses the difficult problem of semantic understanding and, more precisely, the disambiguation of the meaning of the words - Word Sense Disambiguation (WSD). In the current and prospective context of the information society, the existence of the digital format of the fundamental works of a national culture is strictly necessary. It is a topical issue throughout the world of creating a representative corpus of a language accessible through the Internet, the corpus being a concrete, clear picture of the use of that language. In this study we will describe the development of a Romanian language GOLD corpus, related to the multiple meanings existing for various words. We propose a corpus annotation standard, based on three lexical resources as follows: the Thesaurus Dictionary of the Romanian Language in electronic format (eDTLR), from which we extracted a list of words with multiple meanings; from the Reference Corpus for Contemporary Romanian Language (CoRoLa) we extracted contexts in which these words were founded and from the the Romanian WordNet (RoWN) resource, we took into account the sense meaning of the word from the corpus context.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call