Abstract Background Early detection of cancer has significant potential to impact human health and society by decreasing cancer-related morbidity and mortality. While previous approaches to identify cancer-informative biomarkers are predominantly statistical, Harbinger Health has utilized foundational discoveries from developmental biology to design a targeted methylation assay for early cancer detection from cell-free DNA (cfDNA) extracted from plasma. Utilizing this biologically informed approach, we developed a fixed multi-layered logistic regression-based machine learning algorithm, trained with an in-house generated dataset of 1046 samples (621 cancer, 425 non-cancer) that predicts a binary classification (yes/no) for cfDNA samples processed through our assay. We have previously reported high sensitivity for multi-cancer detection, including for early-stage disease. Methods Here, we perform a comprehensive independent analytical validation of our assay and algorithm, encompassing 69 subjects: 19 with newly diagnosed treatment-naïve cancer (8 different cancer types) and 50 individuals with no history, diagnosis, or cancer symptoms. In total, we utilized 122 replicate samples to assess reproducibility and precision, 8 non-template controls (water) to determine limit of blank (LOB), and cohorts of matched biopsy and cfDNA to determine tumor content limit of detection (LOD). Results Precision was assessed within five different sub-studies, by comparing concordance of predicted binary cancer classification between replicate samples, giving results of 0.90 (0.95 CI: 0.764–0.959) for inter-run precision, 1.00 (0.95 CI: 0.796–1.000) for intra-run precision, 1.00 (0.95 CI: 0.871–1.000) for inter-operator precision, 0.96 (0.95 CI: 0.930–0.998) for inter-instrument precision and 0.83 (0.95 CI: 0.641–0.933) for inter-day precision. To determine LOB, we carried 8 non-template controls (water) through the entire assay and detected on average ∼0.02% unique aligned reads of a true sample on the same sequencing run. Finally, to assess tumor content LOD, we developed methodology that uses methylation signal to estimate the amount of tumor-derived DNA in each cfDNA sample and validated our estimates using whole exome sequencing, an orthogonal gold-standard approach. We then assessed the relationship between tumor content and classifier sensitivity using our training data of 625 cancer cfDNA samples and determined that our tumor content LOD whereby 95% true cancer samples were correctly predicted to be 0.037%. Conclusion Our assay shows high performance and high technical reproducibility. Our previously reported high sensitivity for stage 1 and stage 2 cancers, as well as extremely low tumor content LOD reported here supports our ability to perform early-stage multi-cancer detection, where the levels of circulating tumor DNA are low.
Read full abstract