Abstract

Diffuse large B-cell lymphoma (DLBL) has been classified into two distinct molecular subtypes based on their cell of origin- namely germinal center B-cell-like (GCB) and activated B-cell-subtype (ABC) based on Han's algorithm. These two entities are also prognostically distinct. We assessed CT based tumor radiomic features using machine learning classifiers to distinguish the two molecular subtypes. One hundred one patients were accrued in the study after institutional ethics committee approval. GCB subtype was diagnosed in 59 while 42 patients were ABC subtype. Lesion with maximum SUV was delineated using a semi-automated segmentation tool (3D slicer) and radiomic features were extracted using Pyradiomics. A total of 1030 features were extracted including Shape features, first, second, third, higher order and wavelet-based features. Recursive feature elimination was used for feature selection and optimal 10 features were selected. Five machine learning (ML) tools were used to build a model to sub- classify the two molecular subtypes. (Table -1) Model performance was assessed using Accuracy, Precision, Recall, F1 score and Area under curve of receiver operating characteristic curve. The models were constructed by two methods.1) A 5-fold internal cross validation strategy & 2) splitting the data into training (70%) and validation set (30%). Recursive feature elimination using a random forest algorithm method was used to select 7 radiomic features. These 7 features were then used to build the model using the 5 ML classifiers. Of the five classifiers, Random Forest Classifier (RFC) using a 70:30 training: test strategy was the best performer among all the ML models, with the highest accuracy of 80%, AUC of 0.87, recall(sensitivity), f1score(specificity) & precision (Positive Predictive value; PPV) or precision of 82% respectively. When an internal 5-fold cross validation strategy was used, RFC again performed better than other ML classifiers with an accuracy of 75%, AUC of 0.80, recall of 79%, PPV of 77% and specificity of 79%. The machine-learning based Radiomics features extracted from pre-treatment CT images can provide a simple and non-invasive method for the prediction of DLBL molecular subtypes with favorable predictive accuracy. The Random Forest Classifier was the most accurate in distinguishing GCB subtype from the ABC subtype.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call