Landmark detection has been well developed by deep learning methods, and the cascade-connected neural network (CCNN) stands out as a widely used deep learning landmark detection method. CCNNs consist of several stacked network backbones, where the predictions of the previous network backbone are used as the input of the following one. Due to GPU memory bottlenecks, CCNNs have two limitations. First, the network backbones of CCNNs have limited numbers and depths; thus, the learning ability of CCNNs is limited. Second, CCNNs are usually trained in low-resolution images. However, the neighboring pixels in high-resolution images are usually vital for landmark detection, especially for cephalometric landmark detection. This paper interprets CCNNs as the discrete approximation of ordinary differential equations. Relying on this explanation, we further propose a novel model, called the cascade-refine model, which takes advantage of CCNNs and makes it possible to overcome the limitations of number and depth by sharing parameters among stacked network backbones. Moreover, the proposed model obeys the rule of coarse-to-fine architectures, where a global module is used to generate the coarse landmark locations, and a local module is adopted to tune the pixel error of latent landmark locations in the region of interest. The proposed model is trained in an end-to-end manner. Thus, the neighboring detailed high-resolution pixels are directly exploited for cephalometric landmark detection. The proposed cascade-refine model outperforms state-of-the-art methods on the public Automatic Cephalometric X-ray Landmark Detection Challenge 2015 dataset. We also built two private cephalometric X-ray datasets, and experimental results on both datasets demonstrate the good performance of the proposed model.
Read full abstract