Abstract Introduction: Cancer staging can take essential time and expenses away from patients, both of which should be patients’ management. In this current study, we aim to develop a machine learning-based early TNM staging model. Methods: Normalized ribonucleic acid sequencing (RNA-seq) counts data for melanoma patients was extracted from The Cancer Genome Atlas (TCGA). Six different experiments were run to produce machine learning algorithms for nodal metastasis, distant metastasis, combine (nodal or distant) metastasis, and higher vs lower tumor stage (T4/T3 vs T2/T1/Tis). All datasets were split using 80/20 for training and test sets. Synthetic Minority Oversampling Technique (SMOTE) was used to address the imbalanced distribution of the outcome. The algorithm accuracies were determined by a percent of sensitivity, specificity, predictive values (positive and negative: PPV and NPV), and area under the receiver-operator curve (AUROC). Results: The best model for nodal metastasis was a random forest classifier (RF) with targeted gene expression (TE) showed higher sensitivity (98), specificity (100), PPV (100), NPV (94), and AUROC (1.00, 95%CI 0.91-0.99). TE for distant metastasis with RF, showed sensitivity (0), specificity (100), PPV (0), NPV (100), and AUROC (1.00, 95%CI 0.91-1.00). While TE for combined metastasis (nodal or distant) staging algorithm; Nodal or Distant Metastasis (TE) RF, showed sensitivity (98), specificity (100), PPV (100), NPV (99), and AUROC (1.00, 95%CI 0.88-0.98). The tumor staging (DEG and predicting higher stage i.e stage 3 or higher) algorithm; Tumor Staging (TE) RF, showed sensitivity (100), specificity (100), PPV (100), NPV (100), and AUROC (1.00, 95%CI 0.69-0.89) (Table 1). Conclusion: Our machine learning models can predict tumor staging including higher vs lower stage tumor, nodal metastasis, and combined metastasis with high accuracy. However, these results need to be further validated. Table 1. Machine learning models Experiment Sensitivity Specificity PPV NPV AUROC 95%CI Test Set Accuracy N Nodal Metastasis (Pan-Expression) DNN 66.22 16.98 52.69 26.47 0.495 0.25-0.47 54.55 479 Nodal Metastasis (Targeted Expression) DNN 81.13 100.0 100.0 88.10 0.982 0.62-0.83 90.21 479 Nodal Metastasis (Targeted Expression) RF 97.89 100.00 100.00 94 1.00 0.91-0.99 96.74 458 Distant Metastasis (Targeted Expression) RF 0.00 100.00 0.00 100 1.00 0.91-1.00 94.44 450* Nodal or Distant Metastasis (Targeted Expression) RF 97.56 100.00 100.00 98.56 1.00 0.88-0.98 98.91 458 Tumor Staging (Targeted Expression) RF 100.00 100.00 100.00 100.00 1.00 0.69-0.89 100.00 395 *27 cases with distant metastasis Citation Format: Fahad Shabbir Ahmed, Furqan Bin Irfan. Predicting melanoma staging using targeted RNA sequencing data using machine learning [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 5043.
Read full abstract