Semi-automatic audio semantic concept discovery for multimedia retrieval

Yipei Wang,Florian Metze,Shourabh Rawat

doi:10.1109/icassp.2014.6853822

Abstract

Huge amount of videos on the Internet have rare textual information, which makes video retrieval challenging given a text query. Previous work explored semantic concepts for content analysis to assist retrieval. However, the human- defined concepts might fail to cover the data and there is a potential gap between these concepts and the semantics ex- pected from user's query. Also, building a corpus is expensive and time-consuming. To address these issues, we propose a semi-automatic framework to discover the semantic concepts. We limit ourselves in audio modality here. In the paper, we also discuss how to select meaningful vocabulary from the discovered hierarchical sub-categories and provide an ap- proach to detect all the concepts without further annotation. We evaluate the method on NIST 2011 multimedia event detection (MED) dataset. There is a continuing growth of video collections available for searching over the Internet. Given a query, search engine retrieve relevant videos by analyzing their captions or textual descriptions. This initial method faces big problem. A large proportion of the videos are lack of detailed textual informa- tion. Even for those with rich textual descriptions, the gap be- tween the content in the video and the given textual informa- tion is inevitable. To address the problem, advanced technolo- gies in multimedia content analysis have become very popular recently. Previous work has explored detecting semantic concepts in multiple modalities to capture the embedded semantics in the multimedia stream (1) (2) (3). In this paper, we limit ourselves to discuss the audio semantic concepts. The au- dio semantic concepts are often defined by human based on their understanding of the specific application and their ob- servations of limited data. After annotating a training corpus of these defined concepts, people apply multiple supervised methods for the semantic concept detection. Previous stud- ies have adopted methods in speech recognition and speaker identification (2) (3). These approaches have been shown to be effective on certain dataset for certain application. However, there are potential problems of these approaches. Firstly, defining a proper vocabulary of the semantic concepts is time-consuming. Human experts have to observe a large amount of data and summarize the patterns into a number of semantic concepts based on their understanding. In this situation, it is very likely that these semantic concepts fail to cover the data and the vocabulary might be ineffective to retrieve the information needed for the application. These problems become more serious when new videos are added to the collection continuously in practical application. Sec- ondly, the generalization of the human defined vocabulary is another problem. The domain-specific semantic concepts be- come useless in new domains. And other applications might

Highlights

There is a continuing growth of video collections available for searching over the Internet
Human experts have to observe a large amount of data and summarize the patterns into a number of semantic concepts based on their understanding
It is very likely that these semantic concepts fail to cover the data and the vocabulary might be ineffective to retrieve the information needed for the application

Summary

INTRODUCTION

There is a continuing growth of video collections available for searching over the Internet. Human experts have to observe a large amount of data and summarize the patterns into a number of semantic concepts based on their understanding In this situation, it is very likely that these semantic concepts fail to cover the data and the vocabulary might be ineffective to retrieve the information needed for the application. It is very likely that these semantic concepts fail to cover the data and the vocabulary might be ineffective to retrieve the information needed for the application These problems become more serious when new videos are added to the collection continuously in practical application. It is still impossible to retrieve content-related audios given a text query Another problem of the pure unsupervised method is the gap between semantic similarity and acoustic similarity.

Overview

Learning Acoustic Descriptors

Hierarchical Sub-categories Discovery

Semantic Concept Vocabulary Selection

The dataset and seed annotations

Experiment setup

Result and Analysis

CONCLUSION

Vocabulary Selection for MED

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semi-automatic audio semantic concept discovery for multimedia retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: May 1, 2014
Citations: 8	License type: cc-by

Similar Papers

Complex Event Detection via Event Oriented Dictionary Learning
Yan Yan ... Nicu Sebe
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 29
Yan Yan, et. al.Yan Yan ... Nicu Sebe
04 Mar 2015
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 29

Multimodal speaker/speech recognition using lip motion, lip texture and audio
H.E Çetingül ... A.M Tekalp
Signal processing | VOL. 86
H.E Çetingül, et. al.H.E Çetingül ... A.M Tekalp
02 Jun 2006
Signal processing | VOL. 86

Method of Speech Recognition and Speaker Identification using Audio-Visual of Polish Speech and Hidden Markov Models
Mariusz Kubanek
-
Mariusz KubanekMariusz Kubanek
01 Jan 2006
01 Jan 2006

Event oriented dictionary learning for complex event detection.
Yan Yan ... Nicu Sebe
IEEE Transactions on Image Processing | VOL. 24
Yan Yan, et. al.Yan Yan ... Nicu Sebe
16 Mar 2015
IEEE Transactions on Image Processing | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-automatic audio semantic concept discovery for multimedia retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers