Instance segmentation is an important yet challenging task in the computer vision field. Existing mainstream single-stage solution with parameterized mask representation has designed the neck models to fuse features of different layers; however, the performance of instance segmentation is still restricted to the layer-by-layer transmission scheme. In this article, an instance segmentation framework with an adaptive long-neck (ALN) network and atrous-residual structure is proposed. The long-neck network is composed of two bidirectional fusion units, which are cascaded to facilitate the information communication among features of different layers in top-down and bottom-up pathways. In particular, a new cross-layer transmission scheme is introduced in a top-down pathway to achieve a hybrid dense fusion of multiscale features and weights of different features are learned adaptively according to their respective contributions to promote the network convergence. Meanwhile, a bottom-up pathway further complements the features with more location clues. In this way, high-level semantic information and low-level location information are tightly integrated. Furthermore, an atrous-residual structure is added to the mask prototype branch of instance prediction to capture more contextual information. This contributes to the generation of high-quality masks. The experimental results indicate that the proposed method achieves effective segmentation and the outputted masks match the contours of objects.
Read full abstract