From convolutional to recurrent: Case study in Nasopharyngeal Carcinoma segmentation

Ao Feng,Zhenghao Chen,Zongqing Ma,Xi Wu

doi:10.1109/fads.2017.8253187

Abstract

Convolutional Neural Network (CNN) is the prevalent framework of image classification, and is also widely used in other image applications including semantic segmentation. Advantages of CNN include hierarchical extraction of patterns, effective training and a small number of parameters. Do they make CNN the ideal candidate for all image processing tasks? In a patch-based semantic image segmentation experiment of Nasopharyngeal Carcinoma (NPC), three different network structures are tested on the same input — CNN, Recurrent Neural Network (RNN) and a mixed model. Results show that RNN achieves comparable accuracy to CNN, and replacing fully- connected layers with RNN further improves performance. A 2D Graph Cut algorithm follows each network to post-process heat maps for a global optimal solution. Within the limited scope of a patch, CNN, RNN and RNN-injected CNN return similar performance metrics, but it becomes a more complex issue to design an appropriate network structure for the complete image, which may be much larger or even in higher dimension. Fully Convolutional Networks (FCNs) have also been proposed for semantic image segmentation, in which the full image goes through consecutive convolutional units for coarse-grained features, then the smaller feature maps must be up-sampled to match the original size. A multidimensional recurrent network at the pixel level or on top of CNN-processed patches will be explored next, aiming at a larger receptive field and higher training efficiency.

Full Text