Abstract

Identifying genes that play a causal role in cancer evolution remains one of the biggest challenges in cancer biology. With the accumulation of high-throughput multi-omics data over decades, it becomes a great challenge to effectively integrate these data into the identification of cancer driver genes. Here, we propose MODIG, a graph attention network (GAT)-based framework to identify cancer driver genes by combining multi-omics pan-cancer data (mutations, copy number variants, gene expression and methylation levels) with multi-dimensional gene networks. First, we established diverse types of gene relationship maps based on protein-protein interactions, gene sequence similarity, KEGG pathway co-occurrence, gene co-expression patterns and gene ontology. Then, we constructed a multi-dimensional gene network consisting of approximately 20000 genes as nodes and five types of gene associations as multiplex edges. We applied a GAT to model within-dimension interactions to generate a gene representation for each dimension based on this graph. Moreover, we introduced a joint learning module to fuse multiple dimension-specific representations to generate general gene representations. Finally, we used the obtained gene representation to perform a semi-supervised driver gene identification task. The experiment results show that MODIG outperforms the baseline models in terms of area under precision-recall curves and area under the receiver operating characteristic curves. The MODIG program is available at https://github.com/zjupgx/modig. The code and data underlying this article are also available on Zenodo, at https://doi.org/10.5281/zenodo.7057241. Supplementary data are available at Bioinformatics online.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call