Abstract

Multi-label image classification problem is one of the most important and fundamental problems in computer vision. In an image with multiple labels, the objects usually locate at various positions with different scales and poses. Moreover, some labels are associated with the entire image instead of a small region. Therefore, both the global and local information are important for classification. To effectively extract and make full use of these information, in this paper, we present a novel deep Dual-stream nEtwork for the muLTi-lAbel image classification task, DELTA for short. As its name indicates, it is composed of two streams, i.e., the Multi-Instance network and the Global Priors network. The former is used to extract the multi-scale class-related local instances features by modeling the classification problem in a multi-instance learning framework. The latter is devised to capture the global priors from the input image as the global information. These two streams are fused by the final fusion layer. In this way, DELTA can extract and make full use of both the global and local information for classification. Extensive experiments on three benchmark datasets, i.e., PASCAL VOC 2007, PASCAL VOC 2012 and Microsoft COCO, demonstrate that DELTA significantly outperforms several state-of-the-art methods. Moreover, DELTA can automatically locate the key image patterns that trigger the labels.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.