Abstract

Lexical cohesion is a fundamental mechanism for text which requires a pair of words to be interpreted as a certain type of lexical relation (e.g., similarity) to understand a coherent context; we refer to such relations as the contextual lexical relation. However, work on lexical cohesion has not modeled context comprehensively in considering lexical relations due to the lack of linguistic resources. In this paper, we take initial steps to address contextual lexical relations by focusing on the contrast relation, as it is a well-known relation though it is more subtle and relatively less resourced. We present a corpus named Cont 2 Lex to make Contextual Lexical Contrast Recognition a computationally feasible task. We benchmark this task with widely-adopted semantic representations; we discover that contextual embeddings (e.g. BERT) generally outperform static embeddings (e.g. Glove), but barely go beyond 70% in accuracy performance. In addition, we find that all embeddings perform better when CLC occurs within the same sentence, suggesting possible limitations of current computational coherence models. Another intriguing discovery is the improvement of BERT in CLC is largely attributed to its modeling of CLC word pairs co-occurring with other word repetitions. Such observations imply that the progress made in lexical coherence modeling remains relatively primitive even for semantic representations such as BERT that have been empowering numerous standard NLP tasks to approach human benchmarks. Through presenting our corpus and benchmark, we attempt to seed initial discussions and endeavors in advancing semantic representations from modeling syntactic and semantic levels to coherence and discourse levels.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.