Abstract
Multi-object relationship information can help eliminate some incorrect combinations or locations of objects. Moreover, it is favorable to extract scene information for object recognition. In this paper, we introduce a new way to generate image representation and propose a deep learning framework to fuse the contextual dependencies among objects and scene information in an image. It adopts a bidirectional long short-term memory recurrent neural network (BLSTM-RNN) to deal with the problem of variable-length sequence produced by local detectors in different images. Then it is applied to the existing tree context model for further recognition. Experimental results on SUN09 dataset show that our model outperforms the state-of the-art object localization methods.
Highlights
Standard single detectors [1] have been focusing on identifying particular object categories locally
It is as the input to the BLSTM-RNN model
BLSTM-RNNs are powerful at incorporating long periods of contextual information from both directions without suffering from vanishing gradients
Summary
Standard single detectors [1] have been focusing on identifying particular object categories locally. We use information of outputs by local detectors such as confidence and location as input to the given framework. It is as the input to the BLSTM-RNN model. Due to different images often obtain different number of outputs by local detectors. The proposed BLSTM-Context model is used to obtain fixed-size semantic image representation. The obtained image representation is utilized as features to achieve higher recognition accuracies. We apply it to tree context model [6] (BLSTM-Tree) for further prediction
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have