Existing Deep-Learning-based contour detectors suffer from the issues of the sharpness and the correctness of their predictions, which often need offline post-processing to sharpen the results as well as improve model performance. In this work, we present a novel method that can learn to refine object contour in training and directly output crisp object boundaries in inference. To this end, we first introduce a keypoint-focal loss that draws point-based attention to the isolated contour annotations. It allows an edge detector to jointly optimize the appearance thickness and the localization accuracy of predictions in the training procedure. Moreover, we present a regularization loss to further improve the performance of an edge detector. Lastly, we present the Contour Transformer model for precisely localizing object boundaries in images. We repurpose and integrate a Transformer-style hyper module into an encoder–decoder network, effectively aggregating global contextual information on high-level features and significantly enhancing the discriminative power for classifying foreground/background pixels. We train and test our Attention and Contour Transformer detector (ACTD) on four widely adopted datasets, i.e., BSDS500, NYUD, Multi-Cue, and RoadNet. The proposed method achieves an ODS F-score of 0.826 on BSDS500 and an ODS F-score of 0.783 on NYUD, outperforming previous top detectors.
Read full abstract