CMIB: Unsupervised Image Object Categorization in Multiple Visual Contexts

Xiaoqiang Yan,Milos Manic,Hui Yu,Yangdong Ye,Xueying Qiu

doi:10.1109/tii.2019.2939278

Abstract

Object categorization in images is fundamental to various industrial areas, such as automated visual inspection, fast image retrieval, and intelligent surveillance. Most existing methods treat visual features (e.g., scale-invariant feature transform) as content information of the objects, while regarding image tags as their contextual information. However, the image tags can hardly be acquired in completely unsupervised settings, especially when the image volume is too large to be marked. In this article, we propose a novel contextual multivariate information bottleneck (CMIB) method to conduct unsupervised image object categorization in multiple visual contexts. Unlike using manual contexts, the CMIB method first automatically generates a set of high-level basic clusterings by multiple global features, which are unprecedentedly defined as visual contexts since they can provide overall information about the target images. Then, the idea of the data compression procedure for object category discovery is proposed, in which the content and multiple visual contexts are maximally preserved through a “bottleneck.” Specifically, two Bayesian networks are initially built to characterize the relationship between data compression and information preservation. Finally, a novel sequential information-theoretic optimization is proposed to ensure the convergence of the CMIB objective function. Experimental results on seven real-world benchmark image datasets demonstrate that the CMIB method achieves better performance than the state-of-the-art baselines.

Full Text