Multiple-rule bias in the comparison of classification rules

Mohammadmahdi R. Yousefi,Jianping Hua,Edward R. Dougherty

doi:10.1093/bioinformatics/btr262

Mohammadmahdi R. Yousefi, Jianping Hua + Show 1 more

Open Access

https://doi.org/10.1093/bioinformatics/btr262

Copy DOI

Abstract

There is growing discussion in the bioinformatics community concerning overoptimism of reported results. Two approaches contributing to overoptimism in classification are (i) the reporting of results on datasets for which a proposed classification rule performs well and (ii) the comparison of multiple classification rules on a single dataset that purports to show the advantage of a certain rule. This article provides a careful probabilistic analysis of the second issue and the 'multiple-rule bias', resulting from choosing a classification rule having minimum estimated error on the dataset. It quantifies this bias corresponding to estimating the expected true error of the classification rule possessing minimum estimated error and it characterizes the bias from estimating the true comparative advantage of the chosen classification rule relative to the others by the estimated comparative advantage on the dataset. The analysis is applied to both synthetic and real data using a number of classification rules and error estimators. We have implemented in C code the synthetic data distribution model, classification rules, feature selection routines and error estimation methods. The code for multiple-rule analysis is implemented in MATLAB. The source code is available at http://gsp.tamu.edu/Publications/supplementary/yousefi11a/. Supplementary simulation results are also included.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multiple-rule bias in the comparison of classification rules

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Journal: Bioinformatics	Publication Date: May 5, 2011
Citations: 13

Similar Papers

Performance reproducibility index for classification
Edward R. Dougherty ... Mohammadmahdi R. Yousefi
Bioinformatics | VOL. 28
Edward R. Dougherty, et. al.Edward R. Dougherty ... Mohammadmahdi R. Yousefi
06 Sep 2012
Bioinformatics | VOL. 28

Comparison of Four Classification Methods on Small-Sample-Size Synthetic RNA-seq Data
Felitsiya Shakola ... Ivan Ivanov
-
Felitsiya Shakola, et. al.Felitsiya Shakola ... Ivan Ivanov
01 Jan 2023
01 Jan 2023

Bolstered error estimation
Ulisses Braga-Neto ... Edward Dougherty
Pattern Recognition | VOL. 37
Ulisses Braga-Neto, et. al.Ulisses Braga-Neto ... Edward Dougherty
10 Feb 2004
Pattern Recognition | VOL. 37

On the Comparison of Classifiers for Microarray Data
Blaise Hanczar ... Edward Dougherty
Current Bioinformatics | VOL. 5
Blaise Hanczar, et. al.Blaise Hanczar ... Edward Dougherty
01 Mar 2010
Current Bioinformatics | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multiple-rule bias in the comparison of classification rules

Abstract

Talk to us

Similar Papers

More From: Bioinformatics