Abstract
Recent studies have demonstrated the power of deep re-sequencing of the whole genome or exome in understanding cancer genomes. However, targeted capture of selected genomic whole gene-body regions, rather than the whole exome, have several advantages: 1) the genes can be selected based on biology or a hypothesis; 2) mutations in promoter and intronic regions, which have important regulatory roles, can be investigated; and 3) less expensive than whole genome or whole exome sequencing. Therefore, we designed custom high-density oligonucleotide microarrays (NimbleGen Inc.) to capture approximately 1.7 Mb target regions comprising the genomic regions of 28 genes related to colorectal cancer including genes belonging to the WNT signaling pathway, as well as important transcription factors or colon-specific genes that are over expressed in colorectal cancer (CRC). The 1.7 Mb targeted regions were sequenced with a coverage ranged from 32× to 45× for the 28 genes. We identified a total of 2342 sequence variations in the CRC and corresponding adjacent normal tissues. Among them, 738 were novel sequence variations based on comparisons with the SNP database (dbSNP135). We validated 56 of 66 SNPs in a separate cohort of 30 CRC tissues using Sequenom MassARRAY iPLEX Platform, suggesting a validation rate of at least 85% (56/66). We found 15 missense mutations among the exonic variations, 21 synonymous SNPs that were predicted to change the exonic splicing motifs, 31 UTR SNPs that were predicted to occur at the transcription factor binding sites, 20 intronic SNPs located near the splicing sites, 43 SNPs in conserved transcription factor binding sites and 32 in CpG islands. Finally, we determined that rs3106189, localized to the 5′ UTR of antigen presenting tapasin binding protein (TAPBP), and rs1052918, localized to the 3′ UTR of transcription factor 3 (TCF3), were associated with overall survival of CRC patients.
Highlights
With 639,000 deaths per year worldwide, colorectal cancer is the third most common form of cancer and the second leading cause of cancer-related deaths in the Western world (WHO, February 2009, http://www.who.int/mediacentre/factsheets/ fs297/en/) and in China [1,2]
Susceptibility to colorectal cancer has been characterized by the identification of rare inherited mutations in a small number of established genes such as mutations of the APC gene, a gene first identified as the familial adenomatous polyposis (FAP) locus gene [3] that contributes to colorectal tumorigenesis [1,4]
We describe our analysis pipeline that consists of (1) initially sequencing pooled DNA samples followed by validation and further analysis in larger cohorts of samples for cost reduction and (2) a hypothesis-driven targeted capturing and analysis of SNPs and their associations with the cancer phenotypes
Summary
With 639,000 deaths per year worldwide, colorectal cancer is the third most common form of cancer and the second leading cause of cancer-related deaths in the Western world (WHO, February 2009, http://www.who.int/mediacentre/factsheets/ fs297/en/) and in China [1,2]. Recent studies have demonstrated the potential power of deep re-sequencing of candidate genes in human populations to detect rare variants and aid in the understanding of complex human traits [6]. Cancer genome re-sequencing has been performed using exon amplification and conventional Sanger sequencing [7,8,9]. The whole genome or whole exome (by exome capturing) has been used due to technological advances and reduced cost in generation sequencing [10,11,12]. Bass et al applied whole genome sequencing to sequence the tumors of 9 CRC patients and identified 11 in-frame
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have