Empirical evaluation of bias field correction algorithms for computer-aided detection of prostate cancer on T2w MRI

Satish Viswanath,Pratik Patel,Anant Madabhushi,Daniel Palumbo,Jonathan Chappelow,Neil Rofsky,Elizabeth Genega,B Nicholas Bloch,Robert Lenkinski,Ronald M Summers,Bram Van Ginneken

doi:10.1117/12.878813

Abstract

In magnetic resonance imaging (MRI), intensity inhomogeneity refers to an acquisition artifact which introduces a non-linear variation in the signal intensities within the image. Intensity inhomogeneity is known to significantly affect computerized analysis of MRI data (such as automated segmentation or classification procedures), hence requiring the application of bias field correction (BFC) algorithms to account for this artifact. Quantitative evaluation of BFC schemes is typically performed using generalized intensity-based measures (percent coefficient of variation, %CV ) or information-theoretic measures (entropy). While some investigators have previously empirically compared BFC schemes in the context of different domains (using changes in %CV and entropy to quantify improvements), no consensus has emerged as to the best BFC scheme for any given application. The motivation for this work is that the choice of a BFC scheme for a given application should be dictated by application-specific measures rather than ad hoc measures such as entropy and %CV. In this paper, we have attempted to address the problem of determining an optimal BFC algorithm in the context of a computer-aided diagnosis (CAD) scheme for prostate cancer (CaP) detection from T2-weighted (T2w) MRI. One goal of this work is to identify a BFC algorithm that will maximize the CaP classification accuracy (measured in terms of the area under the ROC curve or AUC). A secondary aim of our work is to determine whether measures such as %CV and entropy are correlated with a classifier-based objective measure (AUC). Determining the presence or absence of these correlations is important to understand whether domain independent BFC performance measures such as %CV , entropy should be used to identify the optimal BFC scheme for any given application. In order to answer these questions, we quantitatively compared 3 different popular BFC algorithms on a cohort of 10 clinical 3 Tesla prostate T2w MRI datasets (comprising 39 2D MRI slices): N3 , PABIC, and the method of Cohen et al. Results of BFC via each of the algorithms was evaluated in terms of %CV , entropy, as well as classifier AUC for CaP detection from T2w MRI. The CaP classifier was trained and evaluated on a per-pixel basis using annotations of CaP obtained via registration of T2w MRI and ex vivo whole-mount histology sections. Our results revealed that different BFC schemes resulted in a maximization of different performance measures, that is, the BFC scheme identified by minimization of %CV and entropy was not the one that maximized AUC as well. Moreover, existing BFC evaluation measures (%CV , entropy) did not correlate with AUC (application-based evaluation), but did correlate with each other, suggesting that domain-specific performance measures should be considered in making a decision regarding choice of appropriate BFC scheme. Our results also revealed that N3 provided the best correction of bias field artifacts in prostate MRI data, when the goal was to identify prostate cancer.

Full Text