Abstract

With the rapid development of the network, a large amount of short text data has been generated in life. Chinese short text matching is an important task of natural language processing, but it still faces challenges such as ambiguity in Chinese words and imbalanced ratio of samples in the training set. To address these problems, we propose an external knowledge and data augmentation enhanced model (EDM) for Chinese short text matching. EDM uses jieba, thulac and ltp to generate word lattices and employs HowNet as an external knowledge source for disambiguation. Additionally, dropout and EDA hybrid model is adopted for data augmentation to balance the proportion of samples in the training set. The experimental results on two Chinese short text datasets show that EDM outperforms most of the existing models. Ablation experiments also demonstrate that external knowledge and data augmentation can significantly improve the model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.