Topic classification based on distributed document representation and latent topic information

Peixin Chen,Yan Song,Wu Guo,Qingnan Wang

doi:10.1109/apsipa.2017.8282098

Abstract

The classical bag-of-words and probabilistic topic models are widely used on topic classification tasks. Recently, neural networks have achieved remarkable performance and formed the mainstream, due to their ability to encode distributed semantic features of documents based on word embeddings. To demonstrate the superiority of neural networks, this paper compares Latent Dirichlet Allocation (LDA) with Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Recurrent Convolutional Neural Network (RCNN), which are the mainstream neural network architectures. Beyond this, we combine the latent topic information inferred by LDA and distributed semantic information learned by neural networks to generate a better document representation for topic classification. The experimental results show that the proposed representation outperforms individual systems and can achieve excellent performance on topic classification tasks.

Full Text