Abstract Purpose: Challenges with distinguishing circulating tumor DNA from next-generation sequencing (NGS) artifacts limits variant searches to established solid tumor mutations. Identifying the source(s) of errors associated with the NGS analytics of circulating cell-free DNA (ccfDNA) would enable the determination of an optimal strategy for eliminating noise and broaden ccfDNA clinical applications. Methods: Buffy coat DNA and ccfDNA were isolated from seven healthy adults. For each participant, a single buffy coat DNA library was generated using duplex adapters (dual unique molecular identifiers [UMIs], dual index), while two ccfDNA libraries were separately produced - one library with singleton adapters (single UMI, single index) and one library with duplex adapters. The assignment of a UMI to each template DNA molecule prior to library formation reduces false positives. A family is a set of DNA amplicons (PCR duplicates) with the same UMI. Representing a family with a single consensus sequence reduces PCR errors and sequencing artifacts. Duplex adapters have been developed to abrogate the early PCR errors that beset singleton adapters. Results: The error rate using duplex adapters was significantly lower by 26.4±5.9% (P < 0.001) compared to singleton adapters at family size ≥2, where family size is defined as the number of PCR duplicates that yield a single consensus sequence. Due to the persistence of noise in both singleton and duplex adapters even at large family sizes (i.e., ≥10), we explored potential sources of the residual error. Noise in ccfDNA due to effects from clonal hematopoiesis of indeterminate potential (CHIP) accounted for <4% of error. Removing locations with errors present in all seven samples (i.e., highly patterned error likely due to regions difficult to sequence, align, or both) reduced noise in the duplex and singleton adapters by 18.7±5.1% (P < 0.001) and 14.5±3.1% (P < 0.001), respectively. Finally, we explored the effects of stochastic noise as a source of error. A complete replicate with duplex adapters was generated beginning from the source ccfDNA and library preparation. Using replicate data, the error rate was reduced by an additional 59.9±4.3% (P < 0.001) for the duplex adapters at family size ≥2. Using duplex adapters, accounting for CHIP artifacts, removing locations with highly patterned errors, and including replicate data reduced error by 85.4±4.6% (P < 0.001) compared to the error rate for singleton adapters at family size ≥2. Error continued to decline with each family size increment. Conclusion: Early stochastic PCR errors are a principal source of NGS noise that persist despite duplex molecular barcoding and after removal of patterned errors. Replicates are necessary to eliminate noise and their use in NGS analytics may broaden ccfDNA applications particularly in pre-metastatic and recurrent solid-tumor malignancies by enabling untargeted variant investigations. Citation Format: Hunter R. Underhill, Preetida J. Bhetariya, Sabine Hellwig, David A. Nix, Carrie L. Fuertes, Gabor T. Marth, Mary P. Bronner. The stochastic nature of errors in next-generation sequencing of circulating cell-free DNA [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 441.
Read full abstract