Principal component analysis, classifier complexity, and robustness of sonographic breast lesion classification

K Drukker,N P Gruszauskas,M L Giger

doi:10.1117/12.811341

Abstract

We investigated three classifiers for the task of distinguishing between benign and malignant breast lesions. Classification performance was measured in terms of area under the ROC curve (AUC value). We compared linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and a Bayesian neural net (BNN) with 5 hidden units. For each lesion, 46 image features were extracted and principal component analysis (PCA) of these features was used as classifier input. For each classifier, the optimal number of principal components was determined by performing PCA within each step of a leave-one-case-out protocol for the training dataset (1125 lesions, 14% cancer prevalence) and determining which number of components maximized the AUC value. Subsequently, each classifier was trained on the training dataset and applied ‘cold turkey’ to an independent test set from a different population (341 lesions, 30% cancer prevalence). The optimal number of principal components for LDA was 24, accounting for 97% of the variance in the image features. For QDA and BNN, these numbers were 5 (70%) and 15 (93%), respectively. The LDA, QDA and BNN obtained AUC values of 0.88, 0.85, and 0.91, respectively, in the leave-one-case-out analysis. In the independent test – with AUCs of 0.88, 0.76, and 0.82 – only LDA achieved performance identical to that for the training set (lower bound of 95% non-inferiority interval -.0067), while the others performed significantly worse (p-values << 0.05). While the more complex BNN classifier outperformed the others in leave-one-case-out of a large dataset, LDA was the robust best-performer in an independent test.

Full Text