Abstract

BackgroundLarge-scale cancer genomic projects are providing lots of data on genomic, epigenomic and gene expression aberrations in many cancer types. One key challenge is to detect functional driver pathways and to filter out nonfunctional passenger genes in cancer genomics. Vandin et al. introduced the Maximum Weight Sub-matrix Problem to find driver pathways and showed that it is an NP-hard problem.MethodsTo find a better solution and solve the problem more efficiently, we present a network-based method (NBM) to detect overlapping driver pathways automatically. This algorithm can directly find driver pathways or gene sets de novo from somatic mutation data utilizing two combinatorial properties, high coverage and high exclusivity, without any prior information. We firstly construct gene networks based on the approximate exclusivity between each pair of genes using somatic mutation data from many cancer patients. Secondly, we present a new greedy strategy to add or remove genes for obtaining overlapping gene sets with driver mutations according to the properties of high exclusivity and high coverage.ResultsTo assess the efficiency of the proposed NBM, we apply the method on simulated data and compare results obtained from the NBM, RME, Dendrix and Multi-Dendrix. NBM obtains optimal results in less than nine seconds on a conventional computer and the time complexity is much less than the three other methods. To further verify the performance of NBM, we apply the method to analyze somatic mutation data from five real biological data sets such as the mutation profiles of 90 glioblastoma tumor samples and 163 lung carcinoma samples. NBM detects groups of genes which overlap with known pathways, including P53, RB and RTK/RAS/PI(3)K signaling pathways. New gene sets with p-value less than 1e-3 are found from the somatic mutation data.ConclusionsNBM can detect more biologically relevant gene sets. Results show that NBM outperforms other algorithms for detecting driver pathways or gene sets. Further research will be conducted with the use of novel machine learning techniques.

Highlights

  • Large-scale cancer genomic projects are providing lots of data on genomic, epigenomic and gene expression aberrations in many cancer types

  • We present a novel greedy growth process based on the concept of high coverage and high exclusivity to find gene sets in the gene network constructed in the previous step

  • M is mutually exclusive if Γ(gj) ∩ Γ(gk) = ∅, for all gj, gk, ∈ M, gj ≠ gk, 1 ≤ j, k ≤ n A gene set in A named as a driver pathway is a column sub-matrix of A with high coverage and high exclusivity

Read more

Summary

Introduction

Large-scale cancer genomic projects are providing lots of data on genomic, epigenomic and gene expression aberrations in many cancer types. One key challenge is to detect functional driver pathways and to filter out nonfunctional passenger genes in cancer genomics. A common approach to detect driver mutations is to detect genes with recurrent mutations in a large number of cancer patients. The standard technology to detect recurrently mutated genes is to test a single gene whether its frequency of mutations is significantly higher than expected [1]. This statistical approach has been used to detect many important cancer genes, but it can’t be used to identify driver mutation pathways and driver genes in cancer

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call