WMKL: multi-omics data integration enables novel cancer subtype identification via weight-boosted multi-kernel learning.

Hongyan Cao,Congcong Jia,Ruiling Fang,Haitao Yang,Yanbo Zhang,Yuehua Cui,Zhi Li

doi:10.1038/s41416-024-02587-w

Abstract

Cancer is a heterogeneous disease driven by complex molecular alterations. Cancer subtypes determined from multi-omics data can provide novel insight into personalised precision treatment. It is recognised that incorporating prior weight knowledge into multi-omics data integration can improve disease subtyping. We develop a weighted method, termed weight-boosted Multi-Kernel Learning (wMKL) which incorporates heterogeneous data types as well as flexible weight functions, to boost subtype identification. Given a series of weight functions, we propose an omnibus combination strategy to integrate different weight-related P-values to improve subtyping precision. wMKL models each data type with multiple kernel choices, thus alleviating the sensitivity and robustness issue due to selecting kernel parameters. Furthermore, wMKL integrates different data types by learning weights of different kernels derived from each data type, recognising the heterogeneous contribution of different data types to the final subtyping performance. The proposed wMKL outperforms existing weighted and non-weighted methods. The utility and advantage of wMKL are illustrated through extensive simulations and applications to two TCGA datasets. Novel subtypes are identified followed by extensive downstream bioinformatics analysis to understand the molecular mechanisms differentiating different subtypes. The proposed wMKL method provides a novel strategy for disease subtyping. The wMKL is freely available at https://github.com/biostatcao/wMKL .

Full Text