Vertical Ensemble Co-Training for Text Classification

Gilad Katz,Asaf Shabtai,Cornelia Caragea

doi:10.1145/3137114

Abstract

High-quality, labeled data is essential for successfully applying machine learning methods to real-world text classification problems. However, in many cases, the amount of labeled data is very small compared to that of the unlabeled, and labeling additional samples could be expensive and time consuming. Co-training algorithms, which make use of unlabeled data to improve classification, have proven to be very effective in such cases. Generally, co-training algorithms work by using two classifiers, trained on two different views of the data, to label large amounts of unlabeled data. Doing so can help minimize the human effort required for labeling new data, as well as improve classification performance. In this article, we propose an ensemble-based co-training approach that uses an ensemble of classifiers from different training iterations to improve labeling accuracy. This approach, which we call vertical ensemble , incurs almost no additional computational cost. Experiments conducted on six textual datasets show a significant improvement of over 45% in AUC compared with the original co-training algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Vertical Ensemble Co-Training for Text Classification

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Intelligent Systems and Technology

Lead the way for us

Journal: ACM Transactions on Intelligent Systems and Technology	Publication Date: Oct 25, 2017
Citations: 14

Similar Papers

Training a naive bayes classifier via the EM algorithm with a class distribution constraint
Yoshimasa Tsuruoka ... Jun'Ichi Tsujii
-
Yoshimasa Tsuruoka, et. al.Yoshimasa Tsuruoka ... Jun'Ichi Tsujii
01 Jan 2003
01 Jan 2003

Adapted Features and Instance Selection for Improving Co-training
Gilad Katz ... Lior Rokach
-
Gilad Katz, et. al.Gilad Katz ... Lior Rokach
01 Jan 2014
01 Jan 2014

Co-Regularized Least-Squares for Label Ranking
Evgeni Tsivtsivadze ... Jorma Boberg
-
Evgeni Tsivtsivadze, et. al.Evgeni Tsivtsivadze ... Jorma Boberg
01 Jan 2009
01 Jan 2009

Lexicon expansion for latent variable grammars
Xiaodong Zeng ... Qiuping Huang
Pattern Recognition Letters | VOL. 42
Xiaodong Zeng, et. al.Xiaodong Zeng ... Qiuping Huang
30 Jan 2014
Pattern Recognition Letters | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Vertical Ensemble Co-Training for Text Classification

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Intelligent Systems and Technology