Abstract

Cancer is a complex genomic disease characterized by accumulation of somatic mutations over the lifetime of a patient. Identification of somatic driver mutations that contribute to tumorigenesis is a major goal of cancer genomics. With the recent advances in the sequencing technologies it became possible to study somatic mutations on the whole-genome scale in multiple cancers. While most of the cancer genomics studies were previously focused on identification of driver mutations affecting exons, several examples of driver events within the non-protein-coding regions of the genome were identified, including the recurrent TERT promoter mutations. Such findings have spurred searches for similar examples of recurrent non-coding mutations using computational cancer genomics. In my PhD thesis, I present several computational approaches aimedto identify somatic driver mutations with a specific focus on intergenic regions of the genome. The first part of this thesis focuses on the somatic mutational patterns along the cancer genome and addresses a fundamental problem of computational identification of recurrently mutated regions – regional mutational heterogeneity. Here I studied the correlation of specific genomic features with background somatic mutation rates and devised a background model that accounts for regional mutational heterogeneity. The second part of this thesis describes three different computational approaches designed to identify somatic driver events of functional relevance in cancer. The first approach integrates somatic mutation calls with gene expression data to identify variants associated with altered mRNA levels. The second approach is designed to predict changes in transcription factor binding sites in presence of recurrent somatic mutations. The third approach uses cross-validation scheme to enable parameter tuning in screens for recurrently somatically mutated regions in cancer genomes in an unbiased genome-wide manner. Using this approach, we identify several known cancer-relevant targets, both exonic (e.g., the TP53, MYC, and SMARCA4 genes) as well as non-coding regulatory regions (e.g., the TERT promoter) and uncover novel candidate regulatory driver regions. Among those, a cluster of recurrent intergenic mutations, occurring in an enhancer element near the FADS2 gene, which encodes a critical enzyme in the biosynthesis of long chain polyunsaturated fatty acids and has been previously implicated in cancer. Collectively, the computational approaches presented here helped in uncovering novel somatic candidate events of relevance in cancer and can be further used for various applications in cancer genomics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.