Abstract

In cancer research, background models for mutation rates have been extensively calibrated in coding regions, leading to the identification of many driver genes, recurrently mutated more than expected. Noncoding regions are also associated with disease; however, background models for them have not been investigated in as much detail. This is partially due to limited noncoding functional annotation. Also, great mutation heterogeneity and potential correlations between neighboring sites give rise to substantial overdispersion in mutation count, resulting in problematic background rate estimation. Here, we address these issues with a new computational framework called LARVA. It integrates variants with a comprehensive set of noncoding functional elements, modeling the mutation counts of the elements with a β-binomial distribution to handle overdispersion. LARVA, moreover, uses regional genomic features such as replication timing to better estimate local mutation rates and mutational hotspots. We demonstrate LARVA's effectiveness on 760 whole-genome tumor sequences, showing that it identifies well-known noncoding drivers, such as mutations in the TERT promoter. Furthermore, LARVA highlights several novel highly mutated regulatory sites that could potentially be noncoding drivers. We make LARVA available as a software tool and release our highly mutated annotations as an online resource (larva.gersteinlab.org).

Highlights

  • Genomes of numerous patients have been sequenced [1,2,3,4,5], opening up opportunities to identify the underlying genetic causes for complex disease [6,7,8,9] and develop more effective therapies targeted at specific molecular disease subtypes [10]

  • A large sample difference was observed in several cancer types

  • Due to the rapid decline in time and money involved to perform whole genome sequencing (WGS), data is available for thousands of genomes where previously only a handful were available [57]

Read more

Summary

Introduction

Genomes of numerous patients have been sequenced [1,2,3,4,5], opening up opportunities to identify the underlying genetic causes for complex disease [6,7,8,9] and develop more effective therapies targeted at specific molecular disease subtypes [10] Most of these studies have so far focused on identifying mutations and defects in the protein coding regions, or exomes, of disease genomes [2,11,12,13,14]. Some references showed that a histone H1 variant is linked to onco-

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call