Abstract

Deep convolutional neural networks (CNNs) have shown superior performance on the task of single-label image classification. However, the applicability of CNNs to multi-label images still remains an open problem, mainly because of two reasons. First, each image is usually treated as an inseparable entity and represented as one instance, which mixes the visual information corresponding to different labels. Second, the correlations amongst labels are often overlooked. To address these limitations, we propose a deep multi-modal CNN for multi-instance multi-label image classification, called MMCNN-MIML. By combining CNNs with multi-instance multi-label (MIML) learning, our model represents each image as a bag of instances for image classification and inherits the merits of both CNNs and MIML. In particular, MMCNN-MIML has three main appealing properties: 1) it can automatically generate instance representations for MIML by exploiting the architecture of CNNs; 2) it takes advantage of the label correlations by grouping labels in its later layers; and 3) it incorporates the textual context of label groups to generate multi-modal instances, which are effective in discriminating visually similar objects belonging to different groups. Empirical studies on several benchmark multi-label image data sets show that MMCNN-MIML significantly outperforms the state-of-the-art baselines on multi-label image classification tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.