Abstract
Short text classification is a challenging task in natural language processing. Existing traditional methods using external knowledge to deal with the sparsity and ambiguity of short texts have achieved good results, but accuracy still needs to be improved because they ignore the context-relevant features. Deep learning methods based on RNN or CNN are hence becoming more and more popular in short text classification. However, RNN based methods cannot perform well in the parallelization which causes the lower efficiency, while CNN based methods ignore sequences and relationships between words, which causes the poorer effectiveness. Motivated by this, we propose a novel short text classification approach combining Context-Relevant Features with multi-stage Attention model based on Temporal Convolutional Network (TCN) and CNN, called CRFA. In our approach, we firstly use Probase as external knowledge to enrich the semantic representation for the solution to the data sparsity and ambiguity of short texts. Secondly, we design a multi-stage attention model based on TCN and CNN, where TCN is introduced to improve the parallelization of the proposed model for higher efficiency, and discriminative features are obtained at each stage through the fusion of attention and different-level CNN for a higher accuracy. Specifically, TCN is adopted to capture context-related features at word and concept levels, and meanwhile, in order to measure the importance of features, Word-level TCN (WTCN) based attention, Concept-level TCN (CTCN) based attention and different-level CNN are used at each stage to focus on the information of more important features. Finally, experimental studies demonstrate the effectiveness and efficiency of our approach in the short text classification compared to several well-known short text classification approaches based on CNN and RNN.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have