Diagnostic biases in translational bioinformatics.

Henry Han

doi:10.1186/s12920-015-0116-y

Abstract

BackgroundWith the surge of translational medicine and computational omics research, complex disease diagnosis is more and more relying on massive omics data-driven molecular signature detection. However, how to detect and prevent possible diagnostic biases in translational bioinformatics remains an unsolved problem despite its importance in the coming era of personalized medicine.MethodsIn this study, we comprehensively investigate the diagnostic bias problem by analyzing benchmark gene array, protein array, RNA-Seq and miRNA-Seq data under the framework of support vector machines for different model selection methods. We further categorize the diagnostic biases into different types by conducting rigorous kernel matrix analysis and provide effective machine learning methods to conquer the diagnostic biases.ResultsIn this study, we comprehensively investigate the diagnostic bias problem by analyzing benchmark gene array, protein array, RNA-Seq and miRNA-Seq data under the framework of support vector machines. We have found that the diagnostic biases happen for data with different distributions and SVM with different kernels. Moreover, we identify total three types of diagnostic biases: overfitting bias, label skewness bias, and underfitting bias in SVM diagnostics, and present corresponding reasons through rigorous analysis. Compared with the overfitting and underfitting biases, the label skewness bias is more challenging to detect and conquer because it can be easily confused as a normal diagnostic case from its deceptive accuracy. To tackle this problem, we propose a derivative component analysis based support vector machines to conquer the label skewness bias by achieving the rivaling clinical diagnostic results.ConclusionsOur studies demonstrate that the diagnostic biases are mainly caused by the three major factors, i.e. kernel selection, signal amplification mechanism in high-throughput profiling, and training data label distribution. Moreover, the proposed DCA-SVM diagnosis provides a generic solution for the label skewness bias overcome due to the powerful feature extraction capability from derivative component analysis. Our work identifies and solves an important but less addressed problem in translational research. It also has a positive impact on machine learning for adding new results to kernel-based learning for omics data.

Highlights

With the surge of translational medicine and computational omics research, complex disease diagnosis is more and more relying on massive omics data-driven molecular signature detection
We propose a derivative component analysis based support vector machines (DCA-SVM) to conquer the label skewness bias by comparing its performance with those of the state-of-the-art peers
It is noted that our studies comprehensively identify different diagnostic biases and present novel effective solutions for the important but less addressed problem, Compared with our previous work in conquering SVM overfitting [10], this study provides more systematic and novel results to kernel-based learning for omics data and translational bioinformatics

Summary

Introduction

With the surge of translational medicine and computational omics research, complex disease diagnosis is more and more relying on massive omics data-driven molecular signature detection. How to detect and prevent possible diagnostic biases in translational bioinformatics remains an unsolved problem despite its importance in the coming era of personalized medicine. With the surge of translational medicine and computational omics research, complex disease diagnosis tends to more and more rely on disease signatures discovered from the sheer enormity of high-throughput omics data [1,2,3,4]. Different state-of-the-art classifiers have been widely employed in such a massive data driven disease diagnostics to enhance diagnostic accuracy, there was almost no investigation on their diagnostic biases that are essential for the success of translational medicine [9, 10]. It may tend to favor some phenotype or even totally ignore the other, even if the diagnostic accuracy appears to be reasonable sometimes

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Genomics	Publication Date: Aug 1, 2015
Citations: 56	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Diagnostic biases in translational bioinformatics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics

Lead the way for us

Similar Papers

Support vector machines framework for linear signal processing
J.L Rojo-Álvarez ... A.R Figueiras-Vidal
Signal Processing | VOL. 85
J.L Rojo-Álvarez, et. al.J.L Rojo-Álvarez ... A.R Figueiras-Vidal
13 May 2005
Signal Processing | VOL. 85

The autoimmune targets in IPEX are dominated by gut epithelial proteins
Daniel Eriksson ... Nils Landegren
Journal of Allergy and Clinical Immunology | VOL. 144
Daniel Eriksson, et. al.Daniel Eriksson ... Nils Landegren
23 Apr 2019
Journal of Allergy and Clinical Immunology | VOL. 144

Likelihood ratio in a SVM framework: Fusing linear and non-linear face classifiers
Mayank Vatsa ... Arun Ross
-
Mayank Vatsa, et. al.Mayank Vatsa ... Arun Ross
01 Jun 2008
01 Jun 2008

Fuzzy regression based on asymmetric support vector machines
Chih-Chia Yao ... Pao-Ta Yu
Applied Mathematics and Computation | VOL. 182
Chih-Chia Yao, et. al.Chih-Chia Yao ... Pao-Ta Yu
03 May 2006
Applied Mathematics and Computation | VOL. 182

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Diagnostic biases in translational bioinformatics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Genomics