Abstract
The advancement of deep learning and neural networks has led to the widespread adoption of neural word embeddings as a prominent lexical representation method in natural language processing. With the help of the neural language model trained by the contextual information of large scale text, the neural word embedding obtained by the neural language model captures more semantic correlation in the semantic space, while ignoring the semantic similarity. It will incur high computational cost and time costs during the training process of the model. To better inject semantic similarity into the distribution space and reduce time cost, we perform post processing learning of neural word embeddings using deep metric learning. This paper proposes a lexical enhancement method based on flexible margins and multiple samples learning. In this method, we embed the lexical entailment constraint relations into neural word embeddings. By categorizing the set of lexical constraints and penalizing the negative samples to different degrees according to the gap between categories, and allowing the positive and negative samples to learn from each other in the distributed space. The method we propose significantly improves neural word embeddings. By evaluating neural word embedded vocabulary similarity, the benchmark accuracy is improved to 75%. The method shows great competitiveness in text similarity tasks and text categorization tasks. These findings summarize research results and provide strong support for further applications.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Engineering Applications of Artificial Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.