Abstract

Short text understanding is a key task and popular issue in current natural language processing. Because the content of short texts is characterized by sparsity and semantic limitation, the traditional search methods that analyze only the semantics of literal text for short text understanding and similarity matching have certain restrictions. In this paper, we propose a combined method based on knowledge-based conceptualization and a transformer encoder. Specifically, for each term in a short text, we obtain its concepts and enrich the short text information from a knowledge base based on cooccurrence terms and concepts, construct a convolutional neural network (CNN) to capture local context information, and introduce the subnetwork structure based on a transformer embedding encoder. Then, we embed these concepts into a low-dimensional vector space to obtain more attention from these concepts based on a transformer. Finally, the concept space and transformer encoder space construct the understanding models. An experiment shows that the method in this paper can effectively capture more semantics of short texts and can be applied to a variety of applications, such as short text information retrieval and short text classification.

Highlights

  • With the rapid development of the Internet, short text can be seen everywhere, and relevant studies based on short text, such as information extraction and text classification [1], especially question-answering (QA) systems and short text understanding [2], are receiving increasing attention

  • We enrich the semantic information of short text through conceptualization, and we propose a novel approach for short text understanding

  • There are two components in our approach: i) introduce textual conceptualization and enrich short texts with cooccurrence terms and concepts; ii) construct a convolutional neural network (CNN) to automatically learn high-level features and redesign the subnetwork structure based on a transformer encoder

Read more

Summary

Introduction

With the rapid development of the Internet, short text can be seen everywhere, and relevant studies based on short text, such as information extraction and text classification [1], especially question-answering (QA) systems and short text understanding [2], are receiving increasing attention. The word ‘‘Apple’’ in the short text ‘‘Steve Jobs is the founder of Apple’’ and the word ‘‘apple’’ in ‘‘this kind of apple is sweet’’ have completely different meaning, but it is difficult for the machine to distinguish this word between the two sentences. This lack of sufficient semantic information will eventually lead to errors in our understanding of the text, and the existing text analysis methods and algorithms are not well suited to short texts [3], [4]. There are two components in our approach: i) introduce textual conceptualization and enrich short texts with cooccurrence terms and concepts; ii) construct a convolutional neural network (CNN) to automatically learn high-level features and redesign the subnetwork structure based on a transformer encoder

Objectives
Methods
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.