Abstract

The goal of this work is to build an audio information retrieval system which provides users with flexibility in formulating their queries: from audio examples to naive text. Specifically, the focus of this paper is on using naive text to create input queries describing the desired information of the users. Using naive text queries, however, raises interoperability issues between annotation and retrieval processes due to the wide variety of available audio descriptions. In this paper, we propose an intermediate audio description layer (iADL) to solve the interoperability issues between the annotation and retrieval processes. The iADL comprises two axes corresponding to semantic and onomatopoeic descriptions based on human-to-human communication experiments on how humans express sounds verbally. Various text modeling schemes, such as latent semantic analysis (LSA) and latent topic model, are utilized to transform the naive text onto the proposd iADL.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call