Spectral Clustering based Active Learning with Applications to Text Classification

Wenbo Guo,Yupu Yang,Chun Zhong

doi:10.1051/matecconf/20165601003

Abstract

Active learning is a kind of machine learning algorithms that spontaneously choose data samples from which they will learn. It has been widely used in many data mining fields such as text classification, in which large amounts of unlabelled data samples are available, but labels are hard to get. In this paper, an improved active learning algorithm is proposed, which takes advantages of the distribution feature of the datasets to reduce the labelling cost and increase the accuracy. Before the active learning process, spectral clustering algorithm is applied to divide the datasets into two categories, and instances located at the boundary of two categories are labelled to train the initial classifier. In order to reduce the calculation cost, an incremental method is added in the present algorithm. The algorithm is applied to several text classification problems. The results show it is more effective and more accurate than the traditional active learning algorithm.

Highlights

In the last few years, active learning [1] has become more and more popular because of its effectiveness, especially when dealing with the kind of learning tasks where class labels of each data sample are difficult to get and unlabeled data are sufficient or easy to collect
By applying active learning algorithms, the most informative samples are selected in order to learning the correct classifier with less labeled data samples
Before the start of active learning, the whole datasets is clustered into two categories, and the instances located on the border of the two categories are picked to be the initial support vectors, and during the learning process, the points closest to the hyper plane will be chosen to be the new instance of the training set

Summary

Introduction

In the last few years, active learning [1] has become more and more popular because of its effectiveness, especially when dealing with the kind of learning tasks where class labels of each data sample are difficult to get and unlabeled data are sufficient or easy to collect. If the current hyper-plane lies far away from the optimal one, the instances selected according to the current model will be useless for updating of the model and getting the correct hyper-plane This cost dues ignoring the distribution feature of the training data. Before the start of active learning, the whole datasets is clustered into two categories, and the instances located on the border of the two categories are picked to be the initial support vectors, and during the learning process, the points closest to the hyper plane will be chosen to be the new instance of the training set The effect of this algorithm is show in the results of applying it to several text classification problems

Active learning based on spectral clustering

Support vector machine

Application to text classification

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: MATEC Web of Conferences	Publication Date: Jan 1, 2016
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Spectral Clustering based Active Learning with Applications to Text Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences

Lead the way for us

Similar Papers

Active learning for text classification with reusability
Rong Hu ... Sarah Jane Delany
Expert systems with applications | VOL. 45
Rong Hu, et. al.Rong Hu ... Sarah Jane Delany
19 Oct 2015
Expert systems with applications | VOL. 45

Active Learning Algorithms for the Classification of Hyperspectral Sea Ice Images
Yanling Han ... Wanting Meng
Mathematical Problems in Engineering | VOL. 2015
Yanling Han, et. al.Yanling Han ... Wanting Meng
01 Jan 2015
Mathematical Problems in Engineering | VOL. 2015

Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy
Maria Florina Balcan ... Vitaly Feldman
Algorithmica | VOL. 72
Maria Florina Balcan, et. al.Maria Florina Balcan ... Vitaly Feldman
11 Nov 2014
Algorithmica | VOL. 72

Active learning of interface programs

-

26 Jun 2012
26 Jun 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spectral Clustering based Active Learning with Applications to Text Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences