Characterization of cancer genomes through systematic analyses of oncogenomic data assemblies

Haoyang Cai

doi:10.5167/uzh-164279

Abstract

Cancer is the most common genetic disease in humans. It has been estimated that more than 10 million new cancer patients are detected worldwide each year. In the last decades, many efforts have been made by the research community to contribute to the ﬁght against cancer. These works greatly expanded our understanding of the disease. However, the exact mechanisms of cancer initiation and progression remain elusive. The research on cancer genomes has focused on the identiﬁcation of DNA sequence mutations and chromosomal rearrangements. Some of these somatic alterations can confer a growth advantage to cancer cells and promote cancer development. Mutated genes in cancer genomes can be potential new drug targets or serve as biomarkers for the improvement of diagnostics and therapy. Today, high-throughput genome-wide proﬁling technologies allow us to characterize the molecular proﬁles of cancer samples on various levels, including copy number alterations, gene expression, point mutations and epigenetic marks. Cancer research has gradually shifted from single experiments to large-scale “omics” data analysis approaches. It is an exciting, but challenging work. Our group aims to develop reliable and robust methods to characterize cancer genomes by analyzing large-scale oncogenomic datasets. During the last 4 years, I have focused my efforts on using systems biology and statistical methods to model and annotate genomic array data in human cancer. My research is based on a data collection and re-analysis project that generates very large amounts of microarray data. Computational biology approaches were applied on this dataset for data mining. We collected more than 40000 arrays, including comparative genomic hybridization (CGH) and SNP (single nucleotide polymorphism) arrays, from several public databases. A pipeline was developed to process raw data and determine copy number aberrations (CNAs). All data was converted to a uniﬁed and structured format, and stored in our arrayMap database, together with available clinical information. We also set up an online website for providing this resource to the research community. Based on the large-scale CNA data in our database, the second project aimed to explore the correlation between CNAs and local gene density across cancer genomes. Through a genome binning method, I found that focal CNAs are signiﬁcantly enriched in gene-rich regions. In addition, this positive correlation is not only driven by cancer genes. Since this result is derived from more than 16000 cancer samples, it provides a global insight into the relationship between cancer genome instability and structure from a new perspective. The enrichment reveals that there may be a non-neutral selection pressure for CNA regions across the genome. The observed signiﬁcant positive correlation in this project may enable a better elucidation of mechanisms by which CNAs contribute to tumor development, and promote a more systematic understanding of cancer. The third project presented here is related to a new phenomenon, termed “chromothripsis”, found in cancer development. In this type of events, contiguous chromosomal regions are fragmented into many pieces, and the cell’s DNA repair machinery randomly fuses these segments together to rescue the genome. This is quite different from the classical step-by- step model of cancer development. We developed an algorithm based on scan statistics to automatically detect chromothripsis-like patterns, and identify both size and location of the involved regions. From our input of 22,347 high quality arrays, we identiﬁed 918 chromothripsis cases, representing 132 cancer types. The results from this dataset provide several new insights regarding the distribution of chromothripsis-like patterns and a comprehensive estimation of chromothripsis incidence in a large range of cancer entities. Importantly, our work partly overcomes the limitation of individual research projects resulting from the relatively low incidence of chromothripsis in cancer samples available. An investigation into the affected chromosomal regions supports breakage-fusion-bridge cycles as one of the potential underlying mechanisms. Finally, we evaluated the clinical associations of chromothripsis and found that this event may be associated with a poor outcome. The observed chromothripsis events in our project may reﬂect on heterogenous biological phenomena, and probably vary in their speciﬁc impact on oncogenesis. Taken together, the results presented in this thesis characterize the cancer genome by large-scale oncogenomic array data, and further elucidate the potential mechanisms underlying cancer development.

Full Text