Automatic image annotation is a challenging research problem that includes a large number of tags and various features. Traditional shallow machine learning algorithms lack generalization performance when dealing with complex classification problems. Automatic image annotation based on a stacked auto-encoder (SAE) is proposed to enhance the annotation generalization performance. In this paper, two kinds of strategies, the annotation model and the annotation process, are proposed to solve the main problem of unbalanced data in image annotation. 1) For the annotation model itself, to improve the annotation effect of low frequency tags, we propose a balanced and stacked auto-encoder (BSAE) that can enhance training for low frequency tags. On the basis of this model, a robust BSAE (RBSAE) algorithm which enhances training for sub BSAE model by group is proposed to enhance the annotation stability. This strategy ensures that the model itself has a strong ability to deal with the problem of unbalanced data. 2) For the annotation process, we propose a framework of attribute discrimination annotation (ADA). We first take an unknown image. Then we construct a local equilibrium dataset based on the unknown image and discriminate the high- and low-frequency attribute of the image to determine the corresponding annotation process. One process called the local semantic propagation (LDE-SP) algorithm annotates the low frequency image and the RBSAE algorithm annotates the high frequency image. This strategy improves the overall image annotation effect and ensures that the annotation process has a strong ability to deal with the problem of unbalanced data. For each SAE (including BSAE and RBSAE) annotation model, we propose two kinds of optimization methods, namely, one that is based on non-linear optimization and one on linear optimization. Experimental results on three benchmark datasets show that the proposed model outperforms the previous models in many performance indices.
Read full abstract