Towards optimal audio "keywords" detection for audio content analysis and discovery

Lie Lu,Alan Hanjalic

doi:10.1145/1180639.1180825

Abstract

Natural semantic sound clusters in an audio document, also referred to as audio elements, can be seen as an analogy to words in a text document. Based on the obtained set of audio elements, the key audio elements, or audio keywords, can be detected, which are most prominent in characterizing the content of audio data. As such, they can be of great use for automatic audio content analysis and discovery. Motivated by the limitations of the existing methods for key audio element detection, we propose in this paper a novel unsupervised approach to audio elements weighting using multiple audio documents, analog to word weighting in text document analysis. In our approach, dominant feature vectors (DFV) are first extracted from each audio element, and used to measure the audio elements similarity, based on which the occurrence probability of one audio element in different audio documents can be estimated. Then, four factors, including expected term frequency, expected inverse document frequency, expected term duration, and expected inverse document duration, are calculated and combined to give the importance weight of each audio element. Evaluation of the obtained audio keywords and their usability for auditory scene segmentation and audio document clustering, performed on 5 hours of diverse audio data, shows highly promising results.

Full Text