Abstract

Lexical simplification (LS) aims at replacing complex words with simpler alternatives. LS commonly consists of three main steps: complex word identification, substitute generation, and substitute ranking. Existing LS methods focus on the contextual information of the complex word in the last step (substitute ranking). However, they miss out the following two facts: (1) The word complexity of a polysemous word is very closely related to its context; (2) The step of substitute generation regardless of the context will inevitably produce a large number of spurious candidates. Therefore, we propose a novel LS system LSBert based on pretrained language model BERT to address the aforementioned issues, which is capable of making use of the wider context when both identifying the words in need of simplification and generating substitute candidates for the complex words. Specifically, LSBert consists of a network for complex word identification by fine-tuning BERT and a network for substitute generation based on BERT. Experimental results show that LSBert performs well in both complex word identification and substitute generation, achieving state-of-the-art results in three benchmarks. To facilitate reproducibility, the code of the LSBert system is available at https://github.com/qiang2100/BERT-LS.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.