Abstract

Abstract Background: In 2010, scientific community reached a consensus regarding its subtyping (WNT/β-catenin, Sonic Hedgehog, Group 3, and Group 4). Research demonstrated that different subtype displays distinct genotypic characteristics as well as prognostics. Currently, transcriptomic profiling via NanoString is well-accepted for medulloblastoma subtyping. However, this platform requires high quality RNA sample. An alternative approach is in need for more robust and accurate subtyping. In this study, we explored the feasibility of using machine learning approach for medulloblastoma subtyping based on genome-wide SCNVs. Method: We retrieved SNP array CEL data of 1097 medulloblastoma (GSE37385) from GEO. 800 samples were remained for further analysis after QC. 32 medulloblastoma samples were collected from Robinson G, et's study as independent validation set. Penncnv was used to calculate log R ratio and B allele frequency. DNAcopy was used for CNV segmentation. GISTIC2 was applied to obtain arm and focal SCNAs. A number of classifiers were trained using SCNAs as well as age and gender. Feature selection was conducted in a model specific manner. Model performance was assessed using AUROC. Result: Benchmark on validation set showed that accuracy of Naïve Bayes, Random Forest, AdaBoost, Logistic Regression and SVM are 76.25%, 75%, 75%, 77.5% and 80.62% respectively. AUROC is 90.98%, 92.3%, 91.32%, 92.1% and 92.02% in corresponding order. The relatively overfitting and comparable performance across most models implied that SCNAs own moderate predictive power in medulloblastoma subtyping. Generally, SVM with linear kernel demonstrated superior and balanced performance than other models. Feature importance evaluation via Random Forest and AdaBoost showed better predictive power of arm-level SCNAa of 6p, 6q, 7p, 7q, 16q, 17p, 17q, consistently with previous studies. Focal SCNAs on SIRPB1 and age owned high predictive power. Nanostring performs poorly on distinguishing Group 3 and Group 4. Our method showed same tendency, indicating transcriptomic as well as genomic similarity between these two groups. We decided to permit 10% of validation samples as ‘prediction failure' for those with highly similar SCNAs characteristics. Upon this modification, accuracy of most model raised 3% on average. Conclusion: Our study evaluated the feasibility of using genome-wide SCNAs profile for medulloblastoma subtyping. Benchmark showed moderate predictive power using SCNAs. Among a number of machine learning approaches, SVM with linear kernel demonstrated superior performance. Citation Format: Cheng Yan, Dan Wang, Lan Su, Yufei Yang, Hao Du. A machine learning approach for accurate medulloblastoma subtyping using arm-level SCNAs [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 155.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call