Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery.

Michael Netzer,Christian Baumgartner,Daniel Baumgarten

doi:10.1371/journal.pone.0276607

Abstract

High throughput technologies in genomics enable the analysis of small alterations in gene expression levels. Patterns of such deviations are an important starting point for the discovery and verification of new biomarker candidates. Identifying such patterns is a challenging task that requires sophisticated machine learning approaches. Currently, there are a variety of classification models, and a common approach is to compare the performance and select the best one for a given classification problem. Since the association between the features of a data set and the performance of a particular classification method is still not fully understood, the main contribution of this work is to provide a new methodology for predicting the prediction results of different classifiers in the field of biomarker discovery. We propose here a three-steps computational workflow that includes an analysis of the data set characteristics, the calculation of the classification accuracy and, finally, the prediction of the resulting classification error. The experiments were carried out on synthetic and microarray datasets. Using this method, we showed that the predictability strongly depends on the discriminatory ability of the features, e.g., sets of genes, in two or multi-class datasets. If a dataset has a certain discriminatory ability, this method enables prediction of the classification performance before applying a learning model. Thus, our results contribute to a better understanding of the relationship between dataset characteristics and the corresponding performance of a machine learning method, and suggest the optimal classification method for a given dataset based on its discriminatory ability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Nov 9, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery.

Abstract

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Transplantomics and Biomarkers in Organ Transplantation: A Report From the First International Conference
Minnie M Sarwal ... Kathryn Wood
Transplantation | VOL. 91
Minnie M Sarwal, et. al.Minnie M Sarwal ... Kathryn Wood
27 Feb 2011
Transplantation | VOL. 91

Turning quantity into quality: novel quality assurance strategies for data produced by high-throughput genomics technologies
Hans Peter Fischer
TARGETS | VOL. 1
Hans Peter FischerHans Peter Fischer
01 Oct 2002
TARGETS | VOL. 1

Cutaneous Melanoma Classification: The Importance of High-Throughput Genomic Technologies.
Cristian Scatena ... Sara Tomei
Frontiers in oncology | VOL. 11
Cristian Scatena, et. al.Cristian Scatena ... Sara Tomei
28 May 2021
Frontiers in oncology | VOL. 11

GoSTAG: gene ontology subtrees to tag and annotate genes within a set
Brian D Bennett ... Pierre R Bushel
Source Code for Biology and Medicine | VOL. 12
Brian D Bennett, et. al.Brian D Bennett ... Pierre R Bushel
13 Apr 2017
Source Code for Biology and Medicine | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery.

Abstract

Talk to us

Similar Papers

More From: PloS one