Abstract

Context-aware devices and applications can benefit when audio from real-life environments is categorized into different acoustic scenes. Such categorization is referred to as acoustic scene classification (ASC). However, the scene labels are database dependent. For most of the ASC applications, rather than giving explicit scene labels (like home, park etc), a general estimate of the type of surroundings (e.g. indoor or outdoor) might be enough. ASC has been generally achieved with mel-scaled cepstral features by the state-of-the-art systems. The characteristics that differentiate one scene class from the other are embedded in the texture of the time-frequency representation of the audio. In this paper, we propose to capture this textural information through statistics of local binary pattern of the mel-filterbank energies. The experiments were conducted on two datasets having same scene classes but varying audio sample duration and unequal total amount of data. The proposed framework outperforms two mel-scale based benchmark systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call