Low-level Audio Features Research Articles

Automated music emotion recognition (MER) is a challenging task in Music Information Retrieval with wide-ranging applications. Some recent studies pose MER as a continuous regression problem in the Arousal-Valence (AV) plane. These consist of variations on a common architecture having a universal model of emotional response, a common repertoire of low-level audio features, a bag-of-frames approach to audio analysis, and relatively small data sets. These approaches achieve some success at MER and suggest that further improvements are possible with current technology. Our contribution to the state of the art is to examine just how far one can go within this framework, and to investigate what the limitations of this framework are. We present the results of a systematic study conducted in an attempt to maximize the prediction performance of an automated MER system using the architecture described. We begin with a carefully constructed data set, emphasizing quality over quantity. We address affect induction rather than affect attribution. We consider a variety of algorithms at each stage of the training process, from preprocessing to feature selection and model selection, and we report the results of extensive testing. We found that: (1) none of the variations we considered leads to a substantial improvement in performance, which we present as evidence of a limit on what is achievable under this architecture, and (2) the size of the small data sets that are commonly used in the MER literature limits the possibility of improving the set of features used in MER due to the phenomenon of Subset Selection Bias. We conclude with some proposals for advancing the state of the art.

Sports video has attracted a global viewership. Research effort in this area has been focused on semantic event detection in sports video to facilitate accessing and browsing. Most of the event detection methods in sports video are based on visual features. However, being a significant component of sports video, audio may also play an important role in semantic event detection. In this paper, we have borrowed the concept of the “keyword” from the text mining domain to define a set of specific audio sounds. These specific audio sounds refer to a set of game-specific sounds with strong relationships to the actions of players, referees, commentators, and audience, which are the reference points for interesting sports events. Unlike low-level features, audio keywords can be considered as a mid-level representation, able to facilitate high-level analysis from the semantic concept point of view. Audio keywords are created from low-level audio features with learning by support vector machines. With the help of video shots, the created audio keywords can be used to detect semantic events in sports video by Hidden Markov Model (HMM) learning. Experiments on creating audio keywords and, subsequently, event detection based on audio keywords have been very encouraging. Based on the experimental results, we believe that the audio keyword is an effective representation that is able to achieve satisfying results for event detection in sports video. Application in three sports types demonstrates the practicality of the proposed method.

Low-level Audio Features Research Articles

Related Topics

Articles published on Low-level Audio Features

Automated Music Emotion Recognition: A Systematic Evaluation

Simultaneous Estimation of Chords and Musical Context From Audio

DISAMBIGUATING SOUND THROUGH CONTEXT

Audio keywords generation for sports video analysis

Fusion of audio and motion information on HMM-based highlight extraction for baseball games

Beat tracking of musical performances using low-level audio features

Audio Feature Extraction and Analysis for Scene Segmentation and Classification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Low-level Audio Features Research Articles

Related Topics

Articles published on Low-level Audio Features

Automated Music Emotion Recognition: A Systematic Evaluation

Simultaneous Estimation of Chords and Musical Context From Audio

DISAMBIGUATING SOUND THROUGH CONTEXT

Audio keywords generation for sports video analysis

Fusion of audio and motion information on HMM-based highlight extraction for baseball games

Beat tracking of musical performances using low-level audio features

Audio Feature Extraction and Analysis for Scene Segmentation and Classification