Abstract Background: Lung cancer is the leading cause of cancer-related mortality worldwide, and early detection is crucial for improving patient outcomes. Cell-free DNA (cfDNA) whole genome sequencing (WGS) is a non-invasive technique that has the potential to detect lung cancer early, but it is challenging to develop accurate and generalizable detection algorithms due to the low abundance of tumor-derived cfDNA in early-stage cancer. Methods: We developed a novel deep learning-based multimodal ensemble algorithm that combined two independent models: FEMS (fragment end motif frequency and size) and COV (coverage of genome positions). To assess the performance and ethnic generalizability of our algorithm, we evaluated it on three distinct cohorts: a Korean development cohort, a Korean external cohort, and a Caucasian external cohort. The Korean development cohort for model development comprised 236 lung cancer samples (stage I: 30%, stage II: 12%, stage III: 24%, stage IV: 20%, unknown: 14%) and 2205 normal samples. The Korean external cohort for performance evaluation of the model comprised 288 lung cancer samples (stage I: 31%, stage II: 13%, stage III: 26%, stage IV: 19%, unknown: 12%) and 1463 normal samples. And the Caucasian external cohort for confirm the ethnic generalizability consisted of 126 lung cancer samples (stage I: 17%, stage II: 12%, stage III: 37%, stage IV: 32%, unknown: 3%) and 93 normal samples. All data were generated using Novaseq6000 with a minimum of 40 million reads. Results: Our multimodal ensemble algorithm achieved high sensitivity and specificity in all three cohorts. The Korean development cohort had an overall sensitivity of 94.1% (95% CI: 90.7% to 97%) and specificity of 85%. The Korean external cohort had an overall sensitivity of 90.4% (95% CI: 82.7% to 98.1%) and specificity of 82%. The Caucasian external cohort had an overall sensitivity of 92.1% (95% CI: 87.3% to 96%) and specificity of 82.8%. The Caucasian cohort also had a sensitivity of 80% for early-stage lung cancer. Conclusion: Our deep learning-based multimodal ensemble algorithm is a promising tool for early lung cancer detection using low-coverage cell-free whole genome sequencing (LC-cfWGS). It performed well on three distinct cohorts, including a Caucasian cohort, suggesting that it is robust and generalizable to different populations. This makes it a potential candidate for clinical implementation. Citation Format: Tae-Rim Lee, Jin Mo Ahn, Junnam Lee, Dasom Kim, Byeong-Ho Jeong, Dongryul Oh, Mengchi Wang, Michael Salmans, Andrew Carson, Bryan Leatham, Kristin Fathe, Byung In Lee, Chang-Seok Ki, Young Sik Park, Eun-Hae Cho. A deep learning-based multimodal ensemble algorithm for lung cancer early detection with cross-ethnic generalizability [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 2411.
Read full abstract