Abstract
Instance segmentation is one of the most challenging tasks in computer vision, which requires separating each instance in pixels. To date, a low-resolution binary mask is the dominant paradigm for representation of instance mask. For example, the size of the predicted mask in Mask R-CNN is usually 28×28. Generally, a low-resolution mask can not capture the object details well, while a high-resolution mask dramatically increases the training complexity. In this work, we propose a flexible and effective approach to encode the high-resolution structured mask to the compact representation which shares the advantages of high-quality and low-complexity. The proposed mask representation can be easily integrated into two-stage pipelines such as Mask R-CNN, improving mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset. Moreover, a novel single shot instance segmentation framework can be constructed by extending the existing one-stage detector with a mask branch for this instance representation. Our model shows its superiority over the explicit contour-based pipelines in accuracy with similar computational complexity. We also evaluate our method for video instance segmentation, achieving promising results on YouTube-VIS dataset. Code is available at: https://git.io/AdelaiDet
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.