This paper describes a novel system that uses music emotion and human face as features for automatic highlights extraction for drama video. These high-level audiovisual features are used because music evokes emotion response from the viewer and characters express emotion on their faces. In addition, a novel scheme is developed to improve the accuracy of music emotion recognition in drama video. Specifically, emotion recognition is performed not on the input audio signal but on the noise-free music available from the album of the incidental music, with the presence of incidental music detected by an audio fingerprint technique. Besides the conventional subjective evaluation, we propose a new metric for quantitative performance evaluation of highlights extraction. Experiments conducted over four different types of drama videos demonstrate that the proposed system significantly outperforms baseline ones in terms of both subjective and objective measures.
Read full abstract