Abstract

BackgroundDiscovery of mutated driver genes is one of the primary objective for studying tumorigenesis. To discover some relatively low frequently mutated driver genes from somatic mutation data, many existing methods incorporate interaction network as prior information. However, the prior information of mRNA expression patterns are not exploited by these existing network-based methods, which is also proven to be highly informative of cancer progressions.ResultsTo incorporate prior information from both interaction network and mRNA expressions, we propose a robust and sparse co-regularized nonnegative matrix factorization to discover driver genes from mutation data. Furthermore, our framework also conducts Frobenius norm regularization to overcome overfitting issue. Sparsity-inducing penalty is employed to obtain sparse scores in gene representations, of which the top scored genes are selected as driver candidates. Evaluation experiments by known benchmarking genes indicate that the performance of our method benefits from the two type of prior information. Our method also outperforms the existing network-based methods, and detect some driver genes that are not predicted by the competing methods.ConclusionsIn summary, our proposed method can improve the performance of driver gene discovery by effectively incorporating prior information from interaction network and mRNA expression patterns into a robust and sparse co-regularized matrix factorization framework.

Highlights

  • Discovery of mutated driver genes is one of the primary objective for studying tumorigenesis

  • The first benchmarking gene list used for evaluation is the 537 known driver genes curated by Cancer Gene Census (CGC) which are experimentally supported [45]

  • The second benchmarking gene list is from another independent database of cancer drivers called Integrative Onco Genomics (IntOGen) [46]

Read more

Summary

Introduction

Discovery of mutated driver genes is one of the primary objective for studying tumorigenesis. By propagating the “heat” through the network, they can find highly mutated genes and genes with relatively low mutated frequencies but important in network context. In addition to network propagation, MUFFINN investigates the mutational impact of genes by only their network neighbors, and considers either the highest mutated frequencies or the summation of all frequencies of the direct neighbors [21]. These networkbased methods have pinpointed many novel mutated driver genes, which greatly expands the boundary of our understanding of driver events [3, 18, 21]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call