Investigating the effects of majority voting on CAD systems: a LIDC case study

Miguel Carrazza,Brendan Kennedy,Jacob Furst,Alexander Rasin,Daniela Raicu,Georgia D Tourassi,Samuel G Armato

doi:10.1117/12.2217328

Abstract

Computer-Aided Diagnosis (CAD) systems can provide a second opinion for either identifying suspicious regions on a medical image or predicting the degree of malignancy for a detected suspicious region. To develop a predictive model, CAD systems are trained on low-level image features extracted from image data and the class labels acquired through radiologists’ interpretations or a gold standard (e.g., a biopsy). While the opinion of an expert radiologist is still an estimate of the answer, the ground truth may be extremely expensive to acquire. In such cases, CAD systems are trained on input data that contains multiple expert opinions per case with the expectation that the aggregate of labels will closely approximate the ground truth. Using multiple labels to solve this problem has its own challenges because of the inherent label uncertainty introduced by the variability in the radiologists’ interpretations. Most CAD systems use majority voting (e.g., average, mode) to handle label uncertainty. This paper investigates the effects that majority voting can have on a CAD system by classifying and analyzing different semantic characteristics supplied with the Lung Image Database Consortium (LIDC) dataset. Using a decision tree based iterative predictive model, we show that majority voting with labels that exhibit certain types of skewed distribution can have a significant negative impact on the performance of a CAD system; therefore, alternative strategies for label integration are required when handling multiple interpretations.

Full Text