LSBert: Lexical Simplification Based on BERT

Jipeng Qiang,Yunhao Yuan,Yun Li,Yang Shi,Yi Zhu,Xindong Wu

doi:10.1109/taslp.2021.3111589

Abstract

Lexical simplification (LS) aims at replacing complex words with simpler alternatives. LS commonly consists of three main steps: complex word identification, substitute generation, and substitute ranking. Existing LS methods focus on the contextual information of the complex word in the last step (substitute ranking). However, they miss out the following two facts: (1) The word complexity of a polysemous word is very closely related to its context; (2) The step of substitute generation regardless of the context will inevitably produce a large number of spurious candidates. Therefore, we propose a novel LS system LSBert based on pretrained language model BERT to address the aforementioned issues, which is capable of making use of the wider context when both identifying the words in need of simplification and generating substitute candidates for the complex words. Specifically, LSBert consists of a network for complex word identification by fine-tuning BERT and a network for substitute generation based on BERT. Experimental results show that LSBert performs well in both complex word identification and substitute generation, achieving state-of-the-art results in three benchmarks. To facilitate reproducibility, the code of the LSBert system is available at https://github.com/qiang2100/BERT-LS.

Full Text