Abstract

Temporal feature integration refers to a set of strategies attempting to capture the information conveyed in the temporal evolution of the signal. It has been extensively applied in the context of semantic audio showing performance improvements against the standard frame-based audio classification methods. This paper investigates the potential of an enhanced temporal feature integration method to classify environmental sounds. The proposed method utilizes newly introduced integration functions that capture the texture window shape in combination with standard functions like mean and standard deviation in a classification scheme of 10 environmental sound classes. The results obtained from three classification algorithms exhibit an increase in recognition accuracy against a standard temporal integration with simple statistics, which reveals the discriminative ability of the new metrics.

Highlights

  • Environmental Sound Recognition (ESR) is a semantic audio application that has received considerable attention in recent years

  • The purpose of the paper is to expand the investigation of a new method for temporal feature integration that was recently introduced in Reference [10], where simple statistics mixed with newly proposed functions capturing the texture window shape are evaluated for their performance in a speech/music/other classification task

  • A new method for temporal feature integration introducing a set of robust and lightweight measures that supplement the common statistical measures is tested for its effectiveness in environmental sound recognition tasks

Read more

Summary

Introduction

Environmental Sound Recognition (ESR) is a semantic audio application that has received considerable attention in recent years. The goal of ESR is to capture environmental sounds using audio sensors and assign them to predefined categories (or classes) by applying semantic labels to them. An emerging trend in audio is the incorporation of ESR applications into portable or wearable devices [2]. A mobile device could be designed by applying ESR in order to automatically change the notification mode based on the knowledge of the user’s surroundings [3]. The increasing interest in the field of ESR has heightened the need for optimized algorithms and processing workflows that could achieve higher recognition accuracy and reduce computational requirements

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call