Abstract Regulatory authorities including the FDA are well aware of the crucial role Next Generation Sequencing (NGS) plays in precision medicine. However, the lack of thoroughly-characterized and community-validated reference samples creates a challenge for the development and review of NGS applications in precision oncology. Over 200 members from over 60 institutions have formed the Somatic Mutation Working Group within the FDA-led Sequencing Quality Control Phase II (SEQC-II) Consortium to address this challenge. We have developed strategies to establish a community standard reference samples. We used multiple orthogonal sequencing technologies, sequencing replicates, sequencing centers, and bioinformatics analysis pipelines for mutation detection, thus minimizing biaes related to any particular sequencing platform, assay, or informatics software. Combining with targeted sequencing and microarray technologies, we established a high-confidence mutation call set (Gold Set). Here, we present our initial reference samples: a human triple-negative breast cancer cell line (HCC1395), and matched B lymphocyte derived normal cell line (HCC1395BL). The first public release of the Gold Set includes the fully phased germline variants, high-confidence somatic single nucleotide variants (SNV), small insertion and deletions (InDel), copy number variations (CNV), and structural variations (SV). For variant discovery, we have performed whole genome sequencing (WGS) for each of the tumor-normal genomes to combined depths of 650X from 12 replicates on Illumina HiSeq sequencers, 380X from 9 replicates on Illumina NovaSeq sequencer, 1000X from 11 replicates on 10X Genomics platform, and 50X on PacBio sequencer. In addition, we have also performed Hi-C sequencing to 34X. For variant validation, we have performed WGS to 300X for seven different tumor purities in a tumor-normal titration series, whole exome sequencing (WES) to a total of 12000X from 14 replicates, AmpliSeq targeted sequencing to 1000X, RNASeq to 186 million reads, Affymetrix GeneChip array to 2.1 million probes, and Cytogenetic arrays to 4.3 million probes. HCC1395 has a highly rearranged near-triploid genome with 66 chromosomes. The confidence levels of somatic mutations in the Gold Set were obtained based on the cross-institution and cross-aligner reproducibility of each mutation call. There are approximately 40,000 and 2,000 high-confidence somatic SNVs and InDels, with variant allele frequencies (VAF) ranging between 2% and 100%. The VAFs of both somatic and germline variants are consistent with our CNV calls. We will continuously accept input from the community to advance accuracy and completeness of our Gold Set for the reference samples in order to serve as a community standard well into the future. Citation Format: Li Tai Fang, Wenming Xiao, Somatic Mutation Working Group of the SEQC-II Consortium. Establishing reference samples for universal benchmarking study of NGS technologies [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 3517.
Read full abstract