Abstract

With the increasing use of audio sensors in user generated content (UGC) collections, semantic concept annotation from video soundtracks has become an important research problem. In this paper, we investigate reducing the semantic gap of the traditional data-driven bag-of-audio-words based audio annotation approach by utilizing the large-amount of wild audio data and their rich user tags, from which we propose a new feature representation based on semantic class model distance. We conduct experiments on the data collection from HUAWEI Accurate and Fast Mobile Video Annotation Grand Challenge 2014. We also fuse the audio-only annotation system with a visual-only system. The experimental results show that our audio-only concept annotation system can detect semantic concepts significantly better than does random guessing. The new feature representation achieves comparable annotation performance with the bag-of-audio-words feature. In addition, it can provide more semantic interpretation in the output. The experimental results also prove that the audio-only system can provide significant complementary information to the visual-only concept annotation system for performance boost and for better interpretation of semantic concepts both visually and acoustically.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.