Dataless Text Classification with Descriptive LDA

Xingyuan Chen,John Carroll,Peng Jin,Yunqing Xia

doi:10.1609/aaai.v29i1.9506

Abstract

Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of Wikipedia content to measure semantic distance between documents, which is in turn used to classify test documents based on nearest neighbours. The semantic-based DLTC method has a major drawback in that it relies on a large-scale, finely-compiled semantic knowledge base, which is difficult to obtain in many scenarios. In this paper we propose a novel kind of model, descriptive LDA (DescLDA), which performs DLTC with only category description words and unlabeled documents. In DescLDA, the LDA model is assembled with a describing device to infer Dirichlet priors from prior descriptive documents created with category description words. The Dirichlet priors are then used by LDA to induce category-aware latent topics from unlabeled documents. Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. As neither external knowledge resources nor labeled documents are required, our DLTC method is applicable to a wider range of scenarios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dataless Text Classification with Descriptive LDA

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Feb 19, 2015
Citations: 51

Similar Papers

The External Knowledge Utilization and Radical Innovation in Korea Electronic Industry
Youngwoo Lee ... Jae-Jin Kim
The East Asian Journal of Business Management | VOL. 6
Youngwoo Lee, et. al.Youngwoo Lee ... Jae-Jin Kim
31 Dec 2019
The East Asian Journal of Business Management | VOL. 6

External knowledge resources and new venture success in developing economies: Leveraging innovative opportunities and legitimacy strategies
Francis Donbesuur ... Nathaniel Boso
Technological Forecasting and Social Change | VOL. 185
Francis Donbesuur, et. al.Francis Donbesuur ... Nathaniel Boso
22 Sep 2022
Technological Forecasting and Social Change | VOL. 185

Effective feature selection technique for text classification
Hari Seetha ... R Saravanan
International Journal of Data Mining, Modelling and Management | VOL. 7
Hari Seetha, et. al.Hari Seetha ... R Saravanan
01 Jan 2015
International Journal of Data Mining, Modelling and Management | VOL. 7

Text Classification Based on LDA and Semantic Analysis
Yongxia Jing ... Wei Sun
DEStech Transactions on Computer Science and Engineering | VOL. -
Yongxia Jing, et. al.Yongxia Jing ... Wei Sun
13 Nov 2019
DEStech Transactions on Computer Science and Engineering | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dataless Text Classification with Descriptive LDA

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence