Performance investigation of deep learning versus classifier for polyp differentiation via texture features

David Liang,David Wang,Marc J Pomeroy,Shu Zhang,Alice Wei,Perry J Pickhardt,Yeseul Choi,Maciej A Mazurowski,Horst K Hahn

doi:10.1117/12.2550007

Abstract

Computer-aided diagnosis (CADx) of polyps is essential for advancing computed tomography colonography (CTC) with diagnostic capability. In this paper, we present a study of investigating the performance between deep learning and Random Forest (RF) classifier for polyp differentiation in CTC. First, we conducted feature extraction via an extended Haralick model (eHM) to build a total of 30 texture features. The gray level co-occurrence matrix (GLCM) is generated to encode 3D CT image information into a 2D matrix as input to the convolutional neural network (CNN). Then, we split the polyp classification into two state-of-the-art frameworks: the eHM texture features/RF and the GLCM texture matrices/CNN. We evaluated their performances by the merit of area under the curve of receiver operating characteristic using 1,278 polyps (confirmed by pathology). Results demonstrated that by balancing the data, both CNN model and RF classifier can learn or analyze features effectively, and achieve high performance. RF classifier in general outperformed CNN model with a gain of 6.4% (balanced datasets) and 5.4% (unbalanced datasets), showing its effective in feature extraction and analysis for polyp differentiation. However, the performance of CNN got improved through the addition of new data with a gain of 3.6% (balanced datasets) and 3.4% (unbalanced datasets), whereas RF classifier showed no gain when we enlarged datasets. This demonstrated that CNN model have the potential to improve the classification task performance when dealing with larger dataset. This study provided valuable information on how to design experiments to improve CADx of polyps.

Full Text