Efficient multi-modal fusion on supergraph for scalable image annotation

S Hamid Amiri,Mansour Jamzad

doi:10.1016/j.patcog.2015.01.015

Abstract

Different types of visual features provide multi-modal representation for images in the annotation task. Conventional graph-based image annotation methods integrate various features into a single descriptor and consider one node for each descriptor on the learning graph. However, this graph does not capture the information of individual features, making it unsuitable for propagating the labels of annotated images. In this paper, we address this issue by proposing an approach for fusing the visual features such that a specific subgraph is constructed for each visual modality and then subgraphs are connected to form a supergraph. As the size of supergraph grows linearly with the number of visual features, it is essential to handle large computational complexity of label propagation on the supergraph. To this end, we extract some prototypes from the feature vectors of images and incorporate them into the supergraph construction. The learning process is then conducted on the prototypes, instead of a large number of feature vectors, making the label inference scalable. The experiments on a wide range of standard datasets reveal that the proposed approach achieves scalable image annotation while having an acceptable level of performance.

Full Text