Multi-label dataless text classification with topic modeling

Daochen Zha,Chenliang Li

doi:10.1007/s10115-018-1280-0

Abstract

Manually labeling documents is tedious and expensive, but it is essential for training a traditional text classifier. In recent years, a few dataless text classification techniques have been proposed to address this problem. However, existing works mainly center on single-label classification problems, that is, each document is restricted to belonging to a single category. In this paper, we propose a novel Seed-guided Multi-label Topic Model, named SMTM. With a few seed words relevant to each category, SMTM conducts multi-label classification for a collection of documents without any labeled document. In SMTM, each category is associated with a single category-topic which covers the meaning of the category. To accommodate with multi-label documents, we explicitly model the category sparsity in SMTM by using spike and slab prior and weak smoothing prior. That is, without using any threshold tuning, SMTM automatically selects the relevant categories for each document. To incorporate the supervision of the seed words, we propose a seed-guided biased GPU (i.e., generalized Polya urn) sampling procedure to guide the topic inference of SMTM. Experiments on two public datasets show that SMTM achieves better classification accuracy than state-of-the-art alternatives and even outperforms supervised solutions in some scenarios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-label dataless text classification with topic modeling

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems

Lead the way for us

Journal: Knowledge and Information Systems	Publication Date: Dec 8, 2018
Citations: 24

Similar Papers

A Fast and Effective Framework for Lifelong Topic Model with Self-learning Knowledge
Kang Xu ... Guilin Qi
-
Kang Xu, et. al.Kang Xu ... Guilin Qi
01 Jan 2017
01 Jan 2017

Multi-label Testing for CO2RBFN: A First Approach to the Problem Transformation Methodology for Multi-label Classification
A J Rivera ... M D Pérez-Godoy
-
A J Rivera, et. al.A J Rivera ... M D Pérez-Godoy
01 Jan 2010
01 Jan 2010

A Multi-Label Classification Approach Based on Correlations Among Labels
Raed Alazaidah ... Fadi Thabtah
International Journal of Advanced Computer Science and Applications | VOL. 6
Raed Alazaidah, et. al.Raed Alazaidah ... Fadi Thabtah
01 Jan 2015
International Journal of Advanced Computer Science and Applications | VOL. 6

Practical Significance of GA PartCC in Multi-Label Classification
Annapuna P Patil ... Mridul Tiwary
-
Annapuna P Patil, et. al.Annapuna P Patil ... Mridul Tiwary
01 Oct 2019
01 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-label dataless text classification with topic modeling

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems