Semi-supervised self-training for decision tree classifiers

Jafar Tanha,Maarten Van Someren,Hamideh Afsarmanesh

doi:10.1007/s13042-015-0328-7

Jafar Tanha, Maarten Van Someren + Show 1 more

Open Access

https://doi.org/10.1007/s13042-015-0328-7

Copy DOI

Abstract

We consider semi-supervised learning, learning task from both labeled and unlabeled instances and in particular, self-training with decision tree learners as base learners. We show that standard decision tree learning as the base learner cannot be effective in a self-training algorithm to semi-supervised learning. The main reason is that the basic decision tree learner does not produce reliable probability estimation to its predictions. Therefore, it cannot be a proper selection criterion in self-training. We consider the effect of several modifications to the basic decision tree learner that produce better probability estimation than using the distributions at the leaves of the tree. We show that these modifications do not produce better performance when used on the labeled data only, but they do benefit more from the unlabeled data in self-training. The modifications that we consider are Naive Bayes Tree, a combination of No-pruning and Laplace correction, grafting, and using a distance-based measure. We then extend this improvement to algorithms for ensembles of decision trees and we show that the ensemble learner gives an extra improvement over the adapted decision tree learners.

Highlights

Supervised learning methods are effective when there are sufficient labeled instances
We show that standard decision tree learning as the base learner cannot be effective in a self-training algorithm to semi-supervised learning
The reason for improvement is that using Laplacian correction and No-pruning give better rank for probability estimation of the decision tree, which leads to select a set of high-confidence predictions

Summary

Introduction

Supervised learning methods are effective when there are sufficient labeled instances. In many applications, such as object detection, document and web-page categorization, labeled instances are difficult, expensive, or time consuming to obtain, because they require empirical research or experienced human annotators. Semi-supervised learning algorithms use the labeled data and unlabeled data to construct a classifier. The goal of semi-supervised learning is to use unlabeled instances and combine the information in the unlabeled data with the explicit classification information of labeled data for improving the classification performance. The main issue of semi-supervised learning is how to exploit information from the unlabeled data. A number of different algorithms for semi-supervised learning have been presented, such as the Expectation Maximization (EM) based algorithms [30, 35], self-training [25, 33, 34, 45], co-training [6, 37], Transductive Support Vector Machine (TSVM) [23], SemiSupervised SVM (S3VM) [4], graph-based methods [2, 48], and boosting based semi-supervised learning methods [27, 38, 40]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Machine Learning and Cybernetics	Publication Date: Jan 24, 2015
Citations: 218	License type: open-access

R Discovery Prime

R Discovery Prime

Semi-supervised self-training for decision tree classifiers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Machine Learning and Cybernetics

Lead the way for us

Similar Papers

Ensemble Transfer Learning Algorithm
Xiaobo Liu ... Zhihua Cai
IEEE access : practical innovations, open solutions | VOL. 6
Xiaobo Liu, et. al.Xiaobo Liu ... Zhihua Cai
01 Jan 2018
IEEE access : practical innovations, open solutions | VOL. 6

Chapter 12 - Ensemble learning
Ian H Witten ... Eibe Frank
Data Mining | VOL. -
Ian H Witten, et. al.Ian H Witten ... Eibe Frank
06 Dec 2016
Data Mining | VOL. -

Research on attribute interval optimization method for segmentation based SVM and the Decision Tree Learning
Huanghua ... Zhang Dexian
-
Huanghua, et. al. Huanghua ... Zhang Dexian
01 Jun 2010
01 Jun 2010

Flash Flood Susceptibility Modeling and Magnitude Index Using Machine Learning and Geohydrological Models: A Modified Hybrid Approach
Samy Elmahdy ... Tarig Ali
Remote sensing | VOL. 12
Samy Elmahdy, et. al.Samy Elmahdy ... Tarig Ali
20 Aug 2020
Remote sensing | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-supervised self-training for decision tree classifiers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Machine Learning and Cybernetics