Comparative evaluation of set-level techniques in predictive classification of gene expression samples

Matěj Holec,Jiří Kléma,Filip Železný,Jakub Tolar

doi:10.1186/1471-2105-13-s10-s15

Matěj Holec, Jiří Kléma + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-13-s10-s15

Copy DOI

Abstract

BackgroundAnalysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments.ResultsGenuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step.ConclusionSet-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients.AvailabilityOpen-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available at http://ida.felk.cvut.cz/CESLT.

Highlights

Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes
We propose that the predictive classification setting supported by the crossvalidation procedure for unbiased accuracy estimation, as adopted in this paper, represents exactly such a needed framework enabling objective comparative assessment of gene set selection techniques
Note that in the single gene set case, when aggregation is applied (i.e., Factor 4 in Table 1 is other than None, see the first example in Figure 3), the sample becomes represented by only a single real-valued feature and learning essentially reduces to finding a threshold value for it

Summary

Introduction

Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. Set-level techniques have recently attracted significant attention in the area of gene expression data analysis [1,2,3,4,5,6,7]. Learned classifiers may take diverse forms ranging from geometrically conceived models such as Support Vector Machines [11], which have been especially popular in the gene expression domain, to symbolic models such as logical rules or decision trees that have been applied in this area [12,13,14]

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jun 1, 2012
Citations: 55	License type: cc-by

R Discovery Prime

R Discovery Prime

Comparative evaluation of set-level techniques in predictive classification of gene expression samples

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data
Kristina M Hettne ... Dorien A M Van Dartel
BMC Medical Genomics | VOL. 6
Kristina M Hettne, et. al.Kristina M Hettne ... Dorien A M Van Dartel
29 Jan 2013
BMC Medical Genomics | VOL. 6

Comparative Evaluation of Set-Level Techniques in Microarray Classification
Jiri Klema ... Jakub Tolar
-
Jiri Klema, et. al.Jiri Klema ... Jakub Tolar
01 Jan 2010
01 Jan 2010

Analyzing gene expression data in terms of gene sets: methodological issues
Jelle J Goeman ... Peter Bühlmann
Bioinformatics | VOL. 23
Jelle J Goeman, et. al.Jelle J Goeman ... Peter Bühlmann
15 Feb 2007
Bioinformatics | VOL. 23

Gene Set Databases
Farhad Maleki ... Ian Mcquillan
-
Farhad Maleki, et. al.Farhad Maleki ... Ian Mcquillan
04 Sep 2019
04 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparative evaluation of set-level techniques in predictive classification of gene expression samples

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics