Accurate response prediction allows for personalized cancer management. We developed an unsupervised clustering mechanism to improve effectiveness and efficiency in feature selection operation for accurate patient stratification. Forty-three locally advanced rectal cancer (LARC) patients underwent neoadjuvant chemoradiation were included, pre-treatment T2 and ADC MRIs were acquired for each patient. An initial feature space consisting of 200 radiomic features extracted from manually delineated GTVs from two sequences of MR images. Additional 960 high-order radiomic features extracted from a 3D convolutional neural network (CNN). To remove redundant and irrelevant features, we developed an unsupervised clustering-based feature selection operation to determine the combination of features with potential best performance. The normal process of feature selection involves searching new feature combinations and training new classifiers for evaluating their performance via an iterative process based on selected feature set, the overall time cost is tremendous. To balance the computational cost and search efficiency, firstly, we proposed an unsupervised clustering analysis metric- Comprehensive Cluster Analysis Index (CCAI) through the K-means algorithm, where the average distances between the sample points and the cluster centroids and so on, to construct a multiple linear regression model. Secondly, we extracted sample points by varying the number of features and feature ratios between radiomic features and 3D-CNN features in the output of feature selection. Thirdly, we optimized the model using the sampling points to calculate the CCAI. Two typical feature combination search algorithms, the random forest recursive feature elimination (RF-RFE) and the differential evolution (DE), were used to perform feature selection with CCAI. The accuracy, area-under-curve (AUC) and specificity, based on combined 3D-CNN and radiomic features extracted from combined T2 and ADC images, were 0.852, 0.871, and 0.735, respectively. Our experiments illustrated higher predictive power (AUC = 0.846) based on high-order abstract features extracted from the CNN on ADC and T2 images, compared to the traditional radiomic model (AUC = 0.714). Additionally, the predictive models constructed based on radiomics and CNN features extracted from ADC images were more predictable in terms of treatment responses than the radiomic and CNN imaging features extracted from T2 images. The average computational time of DE and RF-RFE were 50.5s and 128.6s in one single computation, the average computational time were 24.2s and 91.3s with CCAI, respectively. We proposed an unsupervised clustering analysis mechanism to improve the effectiveness of feature selection while decreasing its time cost markedly, which highlight the correlation and complementarity between low- and high-level imaging features, achieving better predictive accuracy.
Read full abstract