Accurate classification is critical in mammography computer-aided diagnosis using content-based image retrieval approaches (CBIR CAD). The objectives of this study were to: 1) develop an accurate ensemble classifier based on domain knowledge and a robust feature selection method for CBIR CAD; 2) propose three new features; and 3) assess the performance of the proposed method and new features by using a relatively large imaging data set. The data set used in this study consisted of 2114 regions of interest (ROI) extracted from a publicly available image database. The proposed ensemble classifier method we called E-DGA-KNN included four steps. In the first step, 804 ROIs depict masses were divided into five classes according to their boundary types. Then, each class of ROI with an equal number of negative ROIs were put together to create a sub-database. Second, a dual-stage genetic algorithm, which was called DGA, was applied on those five sub-databases for feature selection and weights determination respectively. In the third step, five base K-nearest neighbor (KNN) classifiers were created by using the results of the second step on 2114 ROIs, and five detection scores for a given queried ROI were obtained. Finally, these classifiers are combined to yield a final classification. The performances of the proposed methods were evaluated by using receiver operating characteristic (ROC) analysis. A comparison with eight different methods on the data set was provided which include the stepwise linear discriminative analysis algorithm (SLDA) and particle swarm optimization (PSO) algorithm with KNN classifier. When four hybrid feature selection methods were applied with single KNN classifier (ie, DGA-KNN, SLDA-WGA-KNN, SLDA-PSO-KNN, GA-PSO-KNN) and the proposed E-DGA-KNN method to the data set, the computed areas under the ROC curve (Az) were 0.8782 ± 0.0080, 0.8675 ± 0.0081, 0.8623 ± 0.0083, 0.8725 ± 0.0079, and 0.8927 ± 0.0073, respectively. If all features and single KNN classifier were used, the Az value was 0.8478 ± 0.0088. Az values were 0.8592 ± 0.0083 and 0.8632 ± 0.0081 when SLDA or GA algorithm used alone. In this study, an ensemble classifier based on domain knowledge and a dual-stage feature selection method was proposed. Evaluation results indicated that the proposed method achieved largest value of ROC compared to other algorithms. The proposed method shows better performance and has the potential to improve the performance of CBIR CAD in interpreting and analyzing mammograms.
Read full abstract