Batch normalization followed by merging is powerful for phenotype prediction integrating multiple heterogeneous studies.

Yilin Gao,Fengzhu Sun

doi:10.1371/journal.pcbi.1010608

Abstract

Heterogeneity in different genomic studies compromises the performance of machine learning models in cross-study phenotype predictions. Overcoming heterogeneity when incorporating different studies in terms of phenotype prediction is a challenging and critical step for developing machine learning algorithms with reproducible prediction performance on independent datasets. We investigated the best approaches to integrate different studies of the same type of omics data under a variety of different heterogeneities. We developed a comprehensive workflow to simulate a variety of different types of heterogeneity and evaluate the performances of different integration methods together with batch normalization by using ComBat. We also demonstrated the results through realistic applications on six colorectal cancer (CRC) metagenomic studies and six tuberculosis (TB) gene expression studies, respectively. We showed that heterogeneity in different genomic studies can markedly negatively impact the machine learning classifier's reproducibility. ComBat normalization improved the prediction performance of machine learning classifier when heterogeneous populations are present, and could successfully remove batch effects within the same population. We also showed that the machine learning classifier's prediction accuracy can be markedly decreased as the underlying disease model became more different in training and test populations. Comparing different merging and integration methods, we found that merging and integration methods can outperform each other in different scenarios. In the realistic applications, we observed that the prediction accuracy improved when applying ComBat normalization with merging or integration methods in both CRC and TB studies. We illustrated that batch normalization is essential for mitigating both population differences of different studies and batch effects. We also showed that both merging strategy and integration methods can achieve good performances when combined with batch normalization. In addition, we explored the potential of boosting phenotype prediction performance by rank aggregation methods and showed that rank aggregation methods had similar performance as other ensemble learning approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Batch normalization followed by merging is powerful for phenotype prediction integrating multiple heterogeneous studies.

Abstract

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Journal: PLOS Computational Biology	Publication Date: Oct 16, 2023
License type: CC BY 4.0

Similar Papers

Information-Theoretic Bounds on Quantum Advantage in Machine Learning.
Hsin-Yuan Huang ... John Preskill
Physical Review Letters | VOL. 126
Hsin-Yuan Huang, et. al.Hsin-Yuan Huang ... John Preskill
14 May 2021
Physical Review Letters | VOL. 126

Computed Tomography Image Analysis on COVID-19 Cases using Machine Learning Approaches
Jasmine Wang Thye Wei ... Ong Kok Haur
Journal of Advanced Research in Applied Sciences and Engineering Technology | VOL. 32
Jasmine Wang Thye Wei, et. al. Jasmine Wang Thye Wei ... Ong Kok Haur
07 Sep 2023
Journal of Advanced Research in Applied Sciences and Engineering Technology | VOL. 32

A Machine Learning Approach for Detecting Rescue Requests from Social Media
Zheye Wang ... Lei Zou
ISPRS International Journal of Geo-Information | VOL. 11
Zheye Wang, et. al.Zheye Wang ... Lei Zou
16 Nov 2022
ISPRS International Journal of Geo-Information | VOL. 11

A Machine Learning Approach with Human-AI Collaboration for Automated Classification of Patient Safety Event Reports: Algorithm Development and Validation Study.
Hongbo Chen ... Dulaney Wilson
JMIR Human Factors | VOL. 11
Hongbo Chen, et. al.Hongbo Chen ... Dulaney Wilson
25 Jan 2024
JMIR Human Factors | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Batch normalization followed by merging is powerful for phenotype prediction integrating multiple heterogeneous studies.

Abstract

Talk to us

Similar Papers

More From: PLOS Computational Biology