Abstract

We present a new multi-modal technique for assisting visually-impaired people in recognizing objects in public indoor environment. Unlike common methods which aim to solve the problem of multi-class object recognition in a traditional single-label strategy, a comprehensive approach is developed here allowing samples to take more than one label at a time. We jointly use appearance and depth cues, specifically RGBD images, to overcome issues of traditional vision systems using a new complex-valued representation. Inspired by complex-valued neural networks (CVNNs) and multi-label learning techniques, we propose two methods in order to associate each input RGBD image to a set of labels corresponding to the object categories recognized at once. The first one, ML-CVNN, is formalized as a ranking strategy where we make use of a fully complex-valued RBF network and extend it to be able to solve multi-label problems using an adaptive clustering method. The second method, L-CVNNs, deals with problem transformation strategy where instead of using a single network to formalize the classification problem as a ranking solution for the whole label set, we propose to construct one CVNN for each label where the predicted labels will be later aggregated to construct the resulting multi-label vector. Extensive experiments have been carried on two newly collected multi-labeled RGBD datasets prove the efficiency of the proposed techniques.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call