Abstract
Social media platforms allow users to annotate photos with tags that significantly facilitate an effective semantics understanding, search, and retrieval of photos. However, due to the manual, ambiguous, and personalized nature of user tagging, many tags of a photo are in a random order and even irrelevant to the visual content. Aiming to automatically compute tag relevance for a given photo, we propose a tag ranking scheme based on voting from photo neighbors derived from multimodal information. Specifically, we determine photo neighbors leveraging geo, visual, and semantics concepts derived from spatial information, visual content, and textual metadata, respectively. We leverage high-level features instead traditional low-level features to compute tag relevance. Experimental results on a representative set of 203,840 photos from the YFCC100M dataset confirm that above-mentioned multimodal concepts complement each other in computing tag relevance. Moreover, we explore the fusion of multimodal information to refine tag ranking leveraging recall based weighting. Experimental results on the representative set confirm that the proposed algorithm outperforms state-of-the-arts.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have