Abstract

Much as the social landscape in which languages are spoken shifts, language too evolves to suit the needs of its users. Lexical semantic change analysis is a burgeoning field of semantic analysis which aims to trace changes in the meanings of words over time. This paper presents an approach to lexical semantic change detection based on Bayesian word sense induction suitable for novel word sense identification. This approach is used for a submission to SemEval-2020 Task 1, which shows the approach to be capable of the SemEval task. The same approach is also applied to a corpus gleaned from 15 years of Twitter data, the results of which are then used to identify words which may be instances of slang.

Highlights

  • Automatic lexical semantic change detection is a field of semantic analysis which aims to discern how the meanings of words change over time

  • SemEval-2020 Task 1, “Unsupervised Lexical Semantic Change Detection” (Schlechtweg et al, 2020), is a task aimed at providing a single unified framework with which to compare approaches using a standardised dataset, in order to address the difficulty in attempting to compare different sets of results which arises due to the variety in procedures, languages and corpora which have previously been used

  • An initial random set of senses is induced modelled after the generative process of the Hierarchical Dirichlet Process (HDP), which corresponds to a partition of words inside documents, a process based on the Chinese Restaurant Franchise (CRF) representation of a two level HDP (Teh et al, 2004), which partitions customers at the group level njt α if t previously used if t = tnew where α is a concentration parameter

Read more

Summary

Introduction

Automatic lexical semantic change detection is a field of semantic analysis which aims to discern how the meanings of words change over time. As interest in the field has increased, a variety of different procedures, languages and corpora have been used, which leads to difficulty when attempting to compare different sets of results. SemEval-2020 Task 1, “Unsupervised Lexical Semantic Change Detection” (Schlechtweg et al, 2020), is a task aimed at providing a single unified framework with which to compare approaches using a standardised dataset, in order to address the difficulty in attempting to compare different sets of results which arises due to the variety in procedures, languages and corpora which have previously been used. The task involves determining whether a set of target words have changed meaning in two corpora, each of which corresponds to a different time period. We use our method on a corpus constructed from data from Twitter in order to explore the possibility of detecting semantic change over a shorter period of time, with a focus on where that detection can be used for identifying slang

Related Work
System Overview
Results
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.