Abstract
BackgroundRecent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes.ResultsWe propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets.ConclusionsOur evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.
Highlights
Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients
One contribution of BetweenNet is the identification of patient specific dysregulated genes with a measure based on betweenness centrality on personalized networks
A bipartite influence graph is formed to represent the relations between the mutated genes and dysregulated genes in each patient. Another contribution of BetweenNet is the employment of a randomwalk process on the resulting influence bipartite graph
Summary
Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. Erten et al BMC Bioinformatics (2021) 22:62 genes or driver modules of genes by integrating mutations data with various other types of genetic data [3,4,5,6,7,8,9,10]; see [11,12,13,14] for recent comprehensive evaluations and surveys on the topic. Rather than outputting a set of candidate driver genes or modules, a subclass of cancer driver identification methods output a prioritized list of genes ranked by their cancer driving potential. Approaches in this group have utilized the mutation frequency of each gene by comparing with background mutation rates [15,16,17]. With a careful review of the existing cancer catalogues it is easy to observe that most tumors share only a small portion of the set of all mutated genes, giving rise to the so called tumor heterogeneity problem; methods solely based on mutation rates suffer from low sensitivity due to the existence of long-tail of infrequently mutated genes [4, 18]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.