Abstract

BackgroundPathway analysis of a set of genes represents an important area in large-scale omic data analysis. However, the application of traditional pathway enrichment methods to next-generation sequencing (NGS) data is prone to several potential biases, including genomic/genetic factors (e.g., the particular disease and gene length) and environmental factors (e.g., personal life-style and frequency and dosage of exposure to mutagens). Therefore, novel methods are urgently needed for these new data types, especially for individual-specific genome data.MethodologyIn this study, we proposed a novel method for the pathway analysis of NGS mutation data by explicitly taking into account the gene-wise mutation rate. We estimated the gene-wise mutation rate based on the individual-specific background mutation rate along with the gene length. Taking the mutation rate as a weight for each gene, our weighted resampling strategy builds the null distribution for each pathway while matching the gene length patterns. The empirical P value obtained then provides an adjusted statistical evaluation.Principal Findings/ConclusionsWe demonstrated our weighted resampling method to a lung adenocarcinomas dataset and a glioblastoma dataset, and compared it to other widely applied methods. By explicitly adjusting gene-length, the weighted resampling method performs as well as the standard methods for significant pathways with strong evidence. Importantly, our method could effectively reject many marginally significant pathways detected by standard methods, including several long-gene-based, cancer-unrelated pathways. We further demonstrated that by reducing such biases, pathway crosstalk for each individual and pathway co-mutation map across multiple individuals can be objectively explored and evaluated. This method performs pathway analysis in a sample-centered fashion, and provides an alternative way for accurate analysis of cancer-personalized genomes. It can be extended to other types of genomic data (genotyping and methylation) that have similar bias problems.

Highlights

  • In large-scale sequencing studies of cancer genomes, one of the central challenges is to distinguish disease-causing ‘‘driver’’mutations from ‘‘passenger’’ mutations, and allow the development of targeted therapy and medication

  • The most recent findings of The Cancer Genome Atlas (TCGA) projects strongly probability of a pathway being enriched with mutated genes, a brute force way of computing the exact P values was described, and a convolution-based approximation strategy was proposed aiming to reduce the computational burden

  • We proposed the node-based pathway crosstalk using the Jaccard coefficient (JC) measurement, which has been widely applied in setbased analysis [16,17]

Read more

Summary

Introduction

In large-scale sequencing studies of cancer genomes, one of the central challenges is to distinguish disease-causing ‘‘driver’’mutations from ‘‘passenger’’ mutations, and allow the development of targeted therapy and medication. In large-scale sequencing studies of cancer genomes, one of the central challenges is to distinguish disease-causing ‘‘driver’’. Some well-studied examples include mutually exclusive mutations such as EGFR and KRAS in lung cancer [1], and TP53 and MDM2 in glioblastoma. The most recent findings of The Cancer Genome Atlas (TCGA) projects strongly probability of a pathway being enriched with mutated genes, a brute force way of computing the exact P values was described, and a convolution-based approximation strategy was proposed aiming to reduce the computational burden. The application of traditional pathway enrichment methods to next-generation sequencing (NGS) data is prone to several potential biases, including genomic/genetic factors (e.g., the particular disease and gene length) and environmental factors (e.g., personal life-style and frequency and dosage of exposure to mutagens). Novel methods are urgently needed for these new data types, especially for individual-specific genome data

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.