Understanding Deep Representations Learned in Modeling Users Likes.

Ivor W Tsang,Sharath Chandra Guntuku,Sujoy Roy,Weisi Lin,Joey Tianyi Zhou

doi:10.1109/tip.2016.2576278

Abstract

Automatically understanding and discriminating different users' liking for an image is a challenging problem. This is because the relationship between image features (even semantic ones extracted by existing tools, viz., faces, objects, and so on) and users' likes is non-linear, influenced by several subtle factors. This paper presents a deep bi-modal knowledge representation of images based on their visual content and associated tags (text). A mapping step between the different levels of visual and textual representations allows for the transfer of semantic knowledge between the two modalities. Feature selection is applied before learning deep representation to identify the important features for a user to like an image. The proposed representation is shown to be effective in discriminating users based on images they like and also in recommending images that a given user likes, outperforming the state-of-the-art feature representations by ∼15 %-20%. Beyond this test-set performance, an attempt is made to qualitatively understand the representations learned by the deep architecture used to model user likes.

Full Text