Abstract

Abstract Formalin-fixed, paraffin-embedded (FFPE) tissue specimens are routinely used in pathological diagnosis. However, formalin induced artificial mutations make it challenging to accurately analyze next-generation sequencing data. Existing FFPE filtering tools tend to categorize all low-frequency variants as artifacts, making it difficult to identify true variants with low allele frequencies, compromising analysis of tumors with low tumor cell content. To address this issue, we developed DEEPOMICS FFPE, an AI model designed to distinguish true variants from artifacts. We gathered paired exome genome sequencing data from both fresh frozen and FFPE samples of 24 tumors from public sources. These data were used to train and validate the model, with a split of 70% for training and 30% for validation. The deep neural network model, comprising three hidden layers, was trained using input features derived from the outputs of the MuTect2 caller. We employed the SHapley Additive exPlanations algorithm to identify relevant features, and then fine-tuned them based on the training outcomes. We assessed the performance of the final DEEPOMICS FFPE model in comparison to existing models (MuTect filter, FFPolish, and SOBDetector) using well-defined test datasets. Our findings revealed 43 distinguishing characteristics for FFPE artifacts. Refining the quantification of these characteristics resulted in improved model performance. DEEPOMICS FFPE successfully filtered out 99.6% of artifacts while retaining 85.2% of true variants, achieving an F1-score of 87.8 in the validation dataset, surpassing the performance of existing tools. Furthermore, the model demonstrated consistent performance even for variants with low allele frequencies, exhibiting a specificity of 0.998, suggesting its capability to identify subclonal variants. Future enhancement of the model will include the ability to use other variant callers and analysis of whole genome sequencing data. This newly developed tool has potential applications in designing capture panels for personalized circulating tumor DNA assays and identifying candidate neoepitopes for personalized vaccine development. DEEPOMICS FFPE is freely accessible for research purposes on the web at http://deepomics.co.kr/ffpe. Citation Format: Dong-hyuk Heo, Inyoung Kim, Heejae Seo, Seong-Gwang Kim, Minji Kim, Jiin Park, Hongsil Park, Seungmo Kang, Juhee Kim, Soonmyung Paik, Seong-Eui Hong. Reducing artifactual somatic variant calls from formalin-fixed paraffin-embedded specimens by using DEEPOMICS FFPE, a bioinformatic approach based on deep neural networks [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 2275.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call