Abstract

The purpose of the research described in this paper is to examine the existence of correlation between low level audio, visual and textual features and movie content similarity. In order to focus on a well defined and controlled case, we have built a small dataset of movie scenes from three sequel movies. In addition, manual annotations have led to a ground-truth similarity matrix between the adopted scenes. Then, three similarity matrices (one for each medium) have been computed based on Gaussian Mixture Models (audio and visual) and Latent Semantic Indexing (text). We have evaluated the automatically extracted similarities along with two simple fusion approaches and results indicate that the low-level features can lead to an accurate representation of the movie content. In addition, the fusion approach seems to outperform the individual modalities, which is a strong indication that individual modules lead to diverse similarities (in terms of content). Finally, we have evaluated the extracted similarities for different groups of human annotators, based on what a human interprets as similar and the results show that different groups of people correlate better with different modalities. This last result is very important and can be either used in (a) a personalized content-based retrieval and recommender system and (b) in a local weighted fusion approach, in future research.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.