Abstract

Good surrogates that allow people to quickly derive the gist of videos without taking the time to view the full video are crucial to video retrieval and browsing systems. Although there are many kinds of textual and visual surrogates used in video retrieval systems, there are few audio surrogates in practice. To evaluate the effectiveness of audio surrogates alone and in combination with one kind of visual surrogate, fast forwards, a user study with 48 participants was conducted. The study investigated the effects of manually and automatically generated spoken keywords and spoken descriptions, using a text-to-speech synthesizer, on six specific video gisting tasks. Results demonstrate that manually generated spoken descriptions are better than both manually generated spoken keywords and fast forwards for video gisting. Both spoken keywords, whether manually or automatically generated, and fast forwards are better than automatically extracted descriptions. High quality spoken summaries were found very effective for video gisting. Combining fast forwards with either type of spoken text was not significantly better than any of the individual spoken surrogates; however, the visual elements added subjective value to the user experience. Adding spoken descriptions or keywords as surrogates to video retrieval and browsing systems is recommended.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call