A feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data

Xiaomeng Xue,Feng Li,Junliang Shang,Lingyun Dai,Daohui Ge,Qianqian Ren

doi:10.1002/qub2.40

Abstract

AbstractThe identification of tumor driver genes facilitates accurate cancer diagnosis and treatment, playing a key role in precision oncology, along with gene signaling, regulation, and their interaction with protein complexes. To tackle the challenge of distinguishing driver genes from a large number of genomic data, we construct a feature extraction framework for discovering pan‐cancer driver genes based on multi‐omics data (mutations, gene expression, copy number variants, and DNA methylation) combined with protein–protein interaction (PPI) networks. Using a network propagation algorithm, we mine functional information among nodes in the PPI network, focusing on genes with weak node information to represent specific cancer information. From these functional features, we extract distribution features of pan‐cancer data, pan‐cancer TOPSIS features of functional features using the ideal solution method, and SetExpan features of pan‐cancer data from the gene functional features, a method to rank pan‐cancer data based on the average inverse rank. These features represent the common message of pan‐cancer. Finally, we use the lightGBM classification algorithm for gene prediction. Experimental results show that our method outperforms existing methods in terms of the area under the check precision‐recall curve (AUPRC) and demonstrates better performance across different PPI networks. This indicates our framework’s effectiveness in predicting potential cancer genes, offering valuable insights for the diagnosis and treatment of tumors.

Full Text