Short text classification using semantically enriched topic model

Farid Uddin,Zuping Zhang,Yibo Chen,Xin Huang

doi:10.1177/01655515241230793

Farid Uddin, Zuping Zhang + Show 2 more

https://doi.org/10.1177/01655515241230793

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Modelling short text is challenging due to the small number of word co-occurrence and insufficient semantic information that affects downstream Natural Language Processing (NLP) tasks, for example, text classification. Gathering information from external sources is expensive and may increase noise. For efficient short text classification without depending on external knowledge sources, we propose Expressive Short text Classification (EStC). EStC consists of a novel document context-aware semantically enriched topic model called the Short text Topic Model (StTM) that captures words, topics and documents semantics in a joint learning framework. In StTM, the probability of predicting a context word involves the topic distribution of word embeddings and the document vector as the global context, which obtains by weighted averaging of word embeddings on the fly simultaneously with the topic distribution of words without requiring an additional inference method for the document embedding. EStC represents documents in an expressive (number of topics × number of word embedding features) embedding space and uses a linear support vector machine (SVM) classifier for their classification. Experimental results demonstrate that EStC outperforms many state-of-the-art language models in short text classification using several publicly available short text data sets.

Full Text