Abstract Background Accurate detection of tumor sequence when the overall tumor content (TC) of the sample is low (<0.01%) is critical for tumor-informed minimal residual disease (MRD) monitoring. Current methodology generally relies on detection of multiple low-VAF (variant allele fraction) co-mutations. At low TC and low DNA input (<30ng, 9,000 genome equivalents), many variant sites are not expected to have any tumor-derived fragments or will be only marginally above background noise. A method that does not rely on the positive identification of mutations would be useful for detecting early disease recurrence. Methods We developed a maximum likelihood (ML) method for inferring the TC of MRD samples, rather than VAF at individual sites. This method considers the frequency of non-reference reads at each variant site along with estimated error rates. To evaluate our method at low TC, we carried out spike-in simulations of MRD samples across a range of sample TC from 0 to 0.1%. For each simulation, we randomly selected a single normal from a pool of sequenced normal samples to serve as background. Using this background, we randomly selected sites and mutations to represent somatic variants, sampled the existing noise at each site, and the number of ctDNA fragments and reads given the tumor content. Intrinsic error rates were modeled as either the mean error of the sample, the per-reference base error, or the mutation error. An additional randomly selected normal sample was used for extrinsic, per-site error estimates. We created contrived tumor-informed MRD samples by serial two-fold dilution of tumor DNA in a normal background (0.2%-0.0125% sample TC), followed by amplicon sequencing and characterization by our ML method. Results In our spike-in simulations, using 32 variants sites, we achieved 99% sensitivity in detecting tumor sequence at a TC of 0.01% (average ~1 tumor fragment per site), 73% sensitivity at 0.005% (average ~0.5 tumor fragments per site), and 99.9% specificity when using extrinsic error from a normal sample. At 16 sites we achieved 95% and 68% sensitivity respectively, and 99.6% specificity. Without the extrinsic error, specificity was greatly reduced to 71% (16 sites) and 77% (32 sites) at .005% TC. This highlights the advantage of extrinsic error in estimating low TC. In our contrived samples, we found that TC estimates from our ML approach correlated well with the expected values (r = 0.99). Conclusions With our approach, we were able to accurately quantify TC in both simulated and contrived samples. In simulated samples, we can detect TC well below 0.01% with high specificity and using only 16 sites. Importantly, the TC estimation does not rely on identifying individual positive sites. While modeling the intrinsic sequencing error rates in a sample does increase our ability to characterize TC, extrinsic, per-site error rates are needed to control false positives introduced by constitutively noisy sites. Citation Format: Andrew Conley, Huazhang Li, Alex V. Kotlar. Accurate detection of tumor sequence at low tumor content for MRD [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 5066.
Read full abstract