Abstract

Accurate class and attribute recognition is the critical technique to convert the unstructured product image data into structured knowledge base, which provides strong support for product design in the future. However, objects of different classes sharing similar attribute may vary a lot in visual appearances, making it challenging to accurately recognize the objects and their attributes. Different from the traditional multi-label image recognition, the attribute of the object, as high-level semantic information, is not corresponding to certain regions of the object and requires to learn more fine-grained features to represent the latent high level semantic information for the attribute across different object categories. Therefore, a self-supervised method called Deconstruction and Reconstruction Learning Network (DRLN) is proposed to solve the above problems in this paper. The DRLN tries to learn more fine-grained and local feature of the input product image by a self-supervised task, which deconstructs the input product images by randomly shuffling their local regions and further reconstructs the features of corresponding deconstructed images. Besides, the proposed model is optimized in an end-to-end manner by learning from multiple tasks, i.e., the multi-label classification task, the adversarial discrimination task, and the location alignment task. Experimental results demonstrate that the proposed method outperforms the state-of-the-arts for multi-label learning problems on both our product image dataset and another public available attribute recognition dataset. To facilitate future research in this field, all the datasets and codes are directly available online.11https://github.com/Yong-DAI/DRLN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call