Text classification from unlabeled documents with bootstrapping and feature projection techniques

Youngjoong Ko,Jungyun Seo

doi:10.1016/j.ipm.2008.07.004

Abstract

Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Text classification from unlabeled documents with bootstrapping and feature projection techniques

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management

Lead the way for us

Journal: Information Processing and Management	Publication Date: Sep 11, 2008
Citations: 70

Similar Papers

Learning with unlabeled data for text categorization using bootstrapping and feature projection techniques
Youngjoong Ko ... Jungyun Seo
-
Youngjoong Ko, et. al.Youngjoong Ko ... Jungyun Seo
01 Jan 2004
01 Jan 2004

Several alternative term weighting methods for text representation and classification
Zhong Tang ... Song Li
Knowledge-Based Systems | VOL. 207
Zhong Tang, et. al.Zhong Tang ... Song Li
14 Aug 2020
Knowledge-Based Systems | VOL. 207

A WEB-BASED FAST AND RELIABLE TEXT CLASSIFICATION TOOL
Jānis Kapenieks
SOCIETY. TECHNOLOGY. SOLUTIONS. Proceedings of the International Scientific Conference | VOL. 1
Jānis KapenieksJānis Kapenieks
17 Apr 2019
SOCIETY. TECHNOLOGY. SOLUTIONS. Proceedings of the International Scientific Conference | VOL. 1

A Primer on Machine Learning.
Audrene S Edwards ... Bruce Kaplan
Transplantation | VOL. 105
Audrene S Edwards, et. al.Audrene S Edwards ... Bruce Kaplan
18 Aug 2020
Transplantation | VOL. 105

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Text classification from unlabeled documents with bootstrapping and feature projection techniques

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management