101 Background: Noninvasive multi-cancer early detection (MCED) with or without tissue of origin (TOO) has the potential to reduce cancer-related mortality by analyzing circulating cell-free nucleic acids and/or proteins in blood. Accurate prediction of TOO following a positive MCED test would guide selection of confirmatory tests, thereby expediting the definitive diagnosis and prompt initiation of the most appropriate treatment, tailored to the specific cancer type. Here, we report the development of machine learning-based diagnostic classifiers that predict TOO for 13 cancer types with high accuracy using serum microRNAs. Methods: Eight serum miRNA microarray datasets from GEO totaling 6,283 patients across 13 cancer types were used in this study. The patients were split, with an approximate 3:2 ratio, into a training (n=3,844) and a validation set (n=2,439). An ensemble of classifiers was constructed in the training set via the “one vs. rest” approach, thus one classifier for each cancer type. Random forest models with recursive feature elimination (RFE) selected the optimal set of miRNAs that was fed into support vector machine models to generate a prediction probability for each cancer type. The type with the highest probability was considered the predicted cancer type. The performance of these classifiers was evaluated in the validation set in two steps with the 1st using all cancer types and the 2nd using the top 2 or 3 cancer types from the 1st step to achieve a refined prediction. Results: RFE selected 426 miRNAs for building the 13 classification models. In the validation set comprising 2,439 patients across 12 cancer types, the classifiers correctly predicted cancer types for 1,922 (79%) samples based on the highest prediction probability. The accuracy increased to 92% and 95% based on top 2 and 3 predictions. In particular, based on top 3 predictions, the accuracy was >95% for bladder, breast, prostate, gastric, glioma and lung cancers, >85% for ovarian, liver and esophageal cancers, 78% for pancreatic cancer and sarcoma, and 67% for colorectal cancer. Conclusions: With 95% accuracy in narrowing TOO down to 3 organ sites, the miRNA-based TOO classifiers could be used clinically as a reflex test for the simple and highly accurate MCED screening models previously developed ( Cancers 2022,14:1450; ESMO 2024). Together, they support the development of an inexpensive, accurate and noninvasive blood test for MCED with TOO.
Read full abstract