Abstract The heterogeneity of breast cancer has now been demonstrated in a number of large-scale genomics projects, and many recent efforts have focused on describing subgroups of patients by identifying distinguishing biological features that may also be relevant for use in the clinic. Recently, as part of the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), we have performed an integrated analysis of copy number aberrations (CNA) and gene expression profiles for almost 2000 primary tumors, and have identified ten robust Integrative Clusters (IntClusts) with distinct genomic properties. To further refine these IntClusts, we have undertaken a targeted sequencing approach to investigate the mutational profiles of 178 breast cancer-related genes selected based on their appearance in previous sequencing projects as putative drivers. We have used Illumina's transposase-based Nextera method to prepare libraries for sequencing, with custom oligos used for in-solution capture of the exons within the targeted genes. Overall, targeted sequencing with an average coverage depth of 200X was performed for 2,600 primary tumors, of which over 550 had matched normal DNA extracted from either adjacent breast tissue or peripheral blood lymphocytes. The GATK pipeline was used for data preprocessing, with custom scripts used for calling single nucleotide variants and short indels, and to classify their somatic status. We have also determined the copy number status of the target genes in order to obtain a complete platform-consistent characterization of the tumors. Independent expression and SNP6.0 data are available for the majority of samples, while a minimum of 5 years of clinical follow-up data are available for all patients. We are also performing whole exome and targeted RNA sequencing for a subset of the tumors. Our findings so far have both corroborated and added to our knowledge of the breast cancer genome. For example, mutation frequencies of approximately 30% in PIK3CA and TP53 are observed in the overall population; however, the additional resolution provided by the IntClusts has allowed us to identify a subgroup of ER+ tumors that has a lower PIK3CA mutation rate, but a higher number of TP53 mutations (IntClust6; 11% with PIK3CA mutations, 47% with TP53 mutations), compared to other ER+ clusters (e.g. IntClust3; 64% with PIK3CA mutations, 10% with TP53 mutations). Furthermore, there is evidence to suggest that PIK3CA mutations may be associated with survival in this subgroup, even though the gene's prognostic power is otherwise mainly limited to ER- tumors. PIK3CA mutation status also appears to have prognostic value in IntClust4, a CNA-devoid subtype with low genomic instability. We have also investigated a number of other genes in terms of their mutation frequencies and mutation classes, and have identified other candidates that may potentially help refine classification according to the IntClusts. These initial findings illustrate the power of complete genomic characterization of a large number of tumors with detailed long-term clinical information, and demonstrate the utility of the IntClust system in distinguishing between tumors with what may be subtle biological differences. Moreover, we believe that this comprehensive resource will be of great use for the breast cancer field in answering questions that are of both biological and clinical relevance. Citation Format: Bernard Pereira, Suet-Feung Chin, Oscar M. Rueda, Michelle Osborne, Helen Northern, John Peden, Sarah-Jane Dawson, Jean Abraham, METABRIC Group, NeoTango Group, Mark Ross, David Bentley, Carlos Caldas. A targeted sequencing approach to characterize the mutational profiles of 178 genes in 2600 primary breast tumors. [abstract]. In: Proceedings of the AACR Special Conference on Advances in Breast Cancer Research: Genetics, Biology, and Clinical Applications; Oct 3-6, 2013; San Diego, CA. Philadelphia (PA): AACR; Mol Cancer Res 2013;11(10 Suppl):Abstract nr B017.
Read full abstract