Is bagging effective in the classification of small-sample genomic and proteomic data?

Tt Vu,Um Braga-Neto

doi:10.1155/2009/158368

Abstract

There has been considerable interest recently in the application of bagging in the classification of both gene-expression data and protein-abundance mass spectrometry data. The approach is often justified by the improvement it produces on the performance of unstable, overfitting classification rules under small-sample situations. However, the question of real practical interest is whether the ensemble scheme will improve performance of those classifiers sufficiently to beat the performance of single stable, nonoverfitting classifiers, in the case of small-sample genomic and proteomic data sets. To investigate that question, we conducted a detailed empirical study, using publicly-available data sets from published genomic and proteomic studies. We observed that, under t-test and RELIEF filter-based feature selection, bagging generally does a good job of improving the performance of unstable, overfitting classifiers, such as CART decision trees and neural networks, but that improvement was not sufficient to beat the performance of single stable, nonoverfitting classifiers, such as diagonal and plain linear discriminant analysis, or 3-nearest neighbors. Furthermore, as expected, the ensemble method did not improve the performance of these classifiers significantly. Representative experimental results are presented and discussed in this work.

Highlights

Randomized ensemble methods for classifier design combine the decision of an ensemble of classifiers designed on randomly perturbed versions of the available data [1,2,3,4,5]
We present results from a comprehensive empirical study concerning the effect of bagging on the performance of several classification rules, EURASIP Journal on Bioinformatics and Systems Biology including diagonal and plain linear discriminant analysis, 3-nearest neighbors, CART decision trees, and neural networks, using real data from published microarray and mass spectrometry studies
We considered in our experiment several classification rules, listed here in order of complexity: diagonal linear discriminant analysis (DLDA), linear discriminant analysis (LDA), 3-nearest neighbors (3NN), decision trees (CART), and neural networks (NNET) [26, 27]

Summary

Introduction

Randomized ensemble methods for classifier design combine the decision of an ensemble of classifiers designed on randomly perturbed versions of the available data [1,2,3,4,5]. The combination is often done by means of majority voting among the individual classifier decisions [4,5,6], whereas the data perturbation usually employs the bootstrap resampling approach, which corresponds to sampling uniformly with replacement from the original data [7, 8]. The combination of bootstrap resampling and majority voting is known as bootstrap aggregation or bagging [4, 5]. There is scant theoretical justification for the use of this heuristic, other than the expectation that combining the decision of several classifiers will regularize and improve the performance of unstable overfitting classification rules, such as unpruned decision trees, provided one uses a large enough number of classifiers in the ensemble [4, 5]. It is claimed that ensemble rules “do not overfit,” meaning that classification error converges as the number of component classifiers tends to infinity [5]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Bioinformatics and Systems Biology	Publication Date: Jan 1, 2009
Citations: 29	License type: cc-by

R Discovery Prime

R Discovery Prime

Is bagging effective in the classification of small-sample genomic and proteomic data?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Bioinformatics and Systems Biology

Lead the way for us

Similar Papers

Applications of Signal Processing Techniques to Bioinformatics, Genomics, and Proteomics
Erchin Serpedin ... Ulisses Braga-Neto
EURASIP Journal on Bioinformatics and Systems Biology | VOL. 2009
Erchin Serpedin, et. al.Erchin Serpedin ... Ulisses Braga-Neto
01 Jan 2009
EURASIP Journal on Bioinformatics and Systems Biology | VOL. 2009

Combined Proteome and Metabolite-profiling Analyses Reveal Surprising Insights into Yeast Sulfur Metabolism
Alexandra Lafaye ... Jean Labarre
Journal of Biological Chemistry | VOL. 280
Alexandra Lafaye, et. al.Alexandra Lafaye ... Jean Labarre
01 Jul 2005
Journal of Biological Chemistry | VOL. 280

Novel gene sets improve set-level classification of prokaryotic gene expression data
Matěj Holec ... Filip Železný
BMC Bioinformatics | VOL. 16
Matěj Holec, et. al.Matěj Holec ... Filip Železný
28 Oct 2015
BMC Bioinformatics | VOL. 16

Comparison of Support Vector Machines to Other Classifiers Using Gene Expression Data
Grace S Shieh ... Yu-Shan Shih
Communications in Statistics - Simulation and Computation | VOL. 35
Grace S Shieh, et. al.Grace S Shieh ... Yu-Shan Shih
01 Jan 2006
Communications in Statistics - Simulation and Computation | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Is bagging effective in the classification of small-sample genomic and proteomic data?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Bioinformatics and Systems Biology