Evaluation of performance metrics for histopathological image classifier optimization.

Nishant Zachariah,Adeboye O Osunkoya,May D Wang,Senthil Ramamurthy,Sonal Kothari

doi:10.1109/embc.2014.6943990

Abstract

Clinical decision support systems use image processing and machine learning methods to objectively predict cancer in histopathological images. Integral to the development of machine learning classifiers is the ability to generalize from training data to unseen future data. A classification model's ability to accurately predict class label for new unseen data is measured by performance metrics, which also informs the classifier model selection process. Based on our research, commonly used metrics in literature (such as accuracy, ROC curve) do not accurately reflect the trained model's robustness. To the best of our knowledge, no research has been conducted to quantitatively compare performance metrics in the context of cancer prediction in histopathological images. In this paper, we evaluate various performance metrics and show that the Lift metric has the highest correlation between internal and external validation sets of a nested cross validation pipeline (R(2) = 0.57). Thus, we demonstrate that the Lift metric best generalizes classifier performance among the 23 metrics that were evaluated. Using the lift metric, we develop a classifier with a misclassification rate of 0.25 (4-class classifier) for data that the model was not trained on (external validation).

Full Text