Abstract
We propose an enhanced latent topic model based on latent Dirichlet allocation and convolutional neural nets for event classification and annotation in images. Our model builds on the semantic structure relating events, objects and scenes in images. Based on initial labels extracted from convolution neural networks (CNNs), and possibly user-defined tags, we estimate the event category and final annotation of an image through a refinement process based on the expectation–maximization (EM) algorithm. The EM steps allow to progressively ascertain the class category and refine the final annotation of the image. Our model can be thought of as a two-level annotation system, where the first level derives the image event from CNN labels and image tags and the second level derives the final annotation consisting of event-related objects/scenes. Experimental results show that the proposed model yields better classification and annotation performance in the two standard datasets: UIUC-Sports and WIDER.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have