Abstract Background Radiogenomics is an emerging tool with applications in screening for molecular biomarkers in diagnostic and prognostic assessment through the extraction of quantitative data from medical images (Shui et al., Front Oncol 2021). In this study, we aim to develop machine learning (ML) models to predict the most common genomic alterations present in our subset of renal cell carcinoma patients (pts). Methods Retrospectively, pts with tissue genomic testing from CT-guided biopsy samples were identified. Genomic testing was done via GEM ExTra assay, a CAP-accredited, CLIA-certified test encompassing tumor whole exome sequencing and whole transcriptome sequencing (TGen; Phoenix, AZ). Biopsy sample collection sites were identified from pre-biopsy contrast CT images, and the lesions were segmented with ITK-SNAP software, from which 510 radiomic features were extracted with Pyradiomics. The Least Absolute Shrinkage and Selection Operator (LASSO) regression was used to select the most relevant features. Logistic regression (LR) and support vector machine (SVM) classifiers were built for the prediction of PBRM1, VHL, and SETD2 gene alterations. Multiple metrics were used to evaluate the predictive performance via leave-one-out cross-validation, including the area under the receiver operating characteristic (AUROC) and the area under the precision-recall curve (AUPRC). Feature importance was evaluated with Shapley additive explanations (SHAP) method. Results A total of 14 RCC pts (10:4 M:F) with genomic testing from CT-guided biopsies were identified. The majority of pts were White (85.7%) and had clear cell histology (71.4%). The most common locations for the CT-guided biopsy were lung (18.2%), soft tissue (13.6%), kidney (9.1%), and bone (9.1%). The most common alterations were seen in PBRM1 (50%), VHL (43.9%), and SETD2 (35.7%) genes. The PBRM1 gene was predicted with the highest AUROC (0.84) and AUPRC (0.88) with the SVM classifier, followed by the SETD2 gene (AUROC=0.78 and AUPRC=0.66) with LR classifier and VHL gene (AUROC=0.56 and AUPRC=0.65) with SVM classifier. Notably, all three models showed good sensitivity in classifying gene mutation status (PBRM1: 0.86; VHL: 0.88; SETD2: 0.89). Among all radiomic features, first-order features, Gray Level Size Zone Matrix features, and Gray Level Dependence Matrix features were found to be the most important features for predictions. Conclusions Using a CT-based radiomics analysis of the biopsy area, we showed that SVM and LR prediction models could predict PBRM1, VHL, and SETD2 mutations with high accuracy. These models may assist in identifying potentially actionable alterations and yield ease in treatment selection for RCC. Further extensive studies are warranted to validate our findings and improve our model. CDMRP DOD Funding: no
Read full abstract