AbstractThere has been extensive work on human word sense annotation, i.e., manually labeling word uses in natural texts according to their senses. Such labels were primarily created for the tasks of Word Sense Disambiguation (WSD) and Word Sense Induction (WSI). However, almost all datasets annotated with word senses are synchronic datasets, i.e., contain texts created in a relatively short period of time and often do not provide the creation date of the texts. This ignores possible applications in diachronic-historic settings, where the aim is to induce or disambiguate historical word senses or changes in senses across time. To facilitate investigations into historical WSD and WSI and to establish connections with the task of Lexical Semantic Change Detection (LSCD), there is a crucial need for historical word sense-annotated data. Hence, we created a new reliable diachronic WSD/WSI dataset ‘DWUG DE Sense’. We describe the preparation and annotation and analyze central statistics. We then describe a thorough evaluation of different prediction systems for jointly solving both WSI and LSCD tasks. All our systems are based on a state-of-the-art architecture that combines Word-in-Context models and graph clustering techniques with different hyperparameter settings. Our findings reveal that using the WSI task as optimization criterion yields better results for both tasks even when the LSCD task is the focal point of optimization. This underscores that although both tasks are related, WSI seems to be more general and able to incorporate the LSCD task.
Read full abstract