Abstract

To train deep neural networks effectively, a lot of labeled data is typically needed. However, real-time applications make it difficult and expensive to acquire high-quality labels for the data because it takes skill and knowledge to accurately annotate multiple label images. In order to enhance classification performance, it is also crucial to extract image features from all potential objects of various sizes as well as the relationships between labels of numerous label images. The current approaches fall short in their ability to map the label dependencies and effectively classify the labels. They also perform poor to label the unlabeled images when small amount of labeled images available for classification. In order to solve these issues, we suggest a new framework for semi-supervised multiple object label classification using multi-stage Convolutional neural networks with visual attention (MSCNN)and GCN for label co-occurrence embedding(LCE) (MSCNN-LCE-MIC), which combines GCN and attention mechanism to concurrently capture local and global label dependencies throughout the entire image classification process. Four main modules make up MSCNN-LCE-MIC: (1) improved multi-label propagation method for labeling largely available unlabeled image; (2) a feature extraction module using multi-stage CNN with visual attention mechanism that focuses on the connections between labels and target regions to extract accurate features from each input image; (3) a label co-existence learning that applies GCN to discover the associations between different items to create embeddings of label co-occurrence; and (4) an integrated multi-modal fusion module. Numerous tests on MS-COCO and PASCAL VOC2007 show that MSCNN-LCE-MIC significantly improves classification efficiency on mAP 84.3% and 95.8% respectively when compared to the most recent existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call