A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds

Francesc Alías,Xavier Sevillano,Joan Socoró

doi:10.3390/app6050143

Abstract

Endowing machines with sensing capabilities similar to those of humans is a prevalent quest in engineering and computer science. In the pursuit of making computers sense their surroundings, a huge effort has been conducted to allow machines and computers to acquire, process, analyze and understand their environment in a human-like way. Focusing on the sense of hearing, the ability of computers to sense their acoustic environment as humans do goes by the name of machine hearing. To achieve this ambitious aim, the representation of the audio signal is of paramount importance. In this paper, we present an up-to-date review of the most relevant audio feature extraction techniques developed to analyze the most usual audio signals: speech, music and environmental sounds. Besides revisiting classic approaches for completeness, we include the latest advances in the field based on new domains of analysis together with novel bio-inspired proposals. These approaches are described following a taxonomy that organizes them according to their physical or perceptual basis, being subsequently divided depending on the domain of computation (time, frequency, wavelet, image-based, cepstral, or other domains). The description of the approaches is accompanied with recent examples of their application to machine hearing related problems.

Highlights

Endowing machines with sensing capabilities similar to those of humans is a long pursued goal in several engineering and computer science disciplines.Ideally, we would like machines and computers to be aware of their immediate surroundings as human beings are
As defined by Mitrović et al [17], this feature is a two-dimensional representation of acoustic versus modulation frequency that is built upon a specific loudness sensation, and it is obtained by Fourier analysis of the critical bands over time and incorporating a weighting stage that is inspired by the human auditory system
This work has presented an up-to-date review of the most relevant audio feature extraction techniques related to machine hearing which have been developed for the analysis of speech, music and environmental sounds

Summary

Introduction

Endowing machines with sensing capabilities similar to those of humans (such as vision, hearing, touch, smell and taste) is a long pursued goal in several engineering and computer science disciplines. As the reader may have deduced, machine hearing is an extremely complex and daunting task given the wide diversity of possible audio inputs and application scenarios For this reason, it is typically subdivided into smaller subproblems, and most research efforts are focused on solving simpler, more specific tasks. Other kind of sound sources coming from our environment (e.g., traffic noise, sounds from animals in the nature, etc.) do not exhibit such particularities, or at least not in such in a clear way These non-speech nor music related sounds (hereafter denoted as environmental sounds) should be detectable and recognizable by hearing machines as individual events (Chu et al [14]). Given the importance of relating the nature of the signal with the type of extracted features, we detail the primary characteristics of the three most frequent types of signals involved in machine hearing applications: speech, music and environmental sounds.

Machine Hearing

Architecture of Machine Hearing Systems

Audio Features Taxonomy and Review of Extraction Techniques

Time Domain Physical Features

Zero-Crossing Rate-Based Physical Features

Amplitude-Based Features

Power-Based Features

Rhythm-Based Physical Features

Frequency Domain Physical Features

Autoregression-Based Frequency Features

STFT-Based Frequency Features

Brightness-Related Physical Frequency Features

Tonality-Related Physical Frequency Features

Chroma-Related Physical Frequency Features

Spectrum Shape-Related Physical Frequency Features

Wavelet-Based Physical Features

Image Domain Physical Features

Cepstral Domain Physical Features

Other Domains

Perceptual Audio Features Extraction Techniques

Zero-Crossing Rate-Based Perceptual Features

Perceptual Autocorrelation-Based Features

Rhythm Pattern

Frequency Domain Perceptual Features

Modulation-Based Perceptual Frequency Features

Brightness-Related Perceptual Frequency Features

Tonality-Related Perceptual Frequency Features

Loudness-Related Perceptual Frequency Features

Roughness-Related Perceptual Frequency Features

Wavelet-Based Perceptual Features

Multiscale Spectro-Temporal-Based Perceptual Features

Image Domain Perceptual Features

Cepstral Domain Perceptual Features

Perceptual Filter Banks-Based Cepstral Features

Autoregression-Based Cepstral Features

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: May 12, 2016
Citations: 392	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Trends in audio signal feature extraction methods
Garima Sharma ... Sridhar Krishnan
Applied Acoustics | VOL. 158
Garima Sharma, et. al.Garima Sharma ... Sridhar Krishnan
23 Sep 2019
Applied Acoustics | VOL. 158

Top‐down modulation of auditory processing: effects of sound context, musical expertise and attentional focus
M Tervaniemi ... S Kruck
European Journal of Neuroscience | VOL. 30
M Tervaniemi, et. al.M Tervaniemi ... S Kruck
01 Oct 2009
European Journal of Neuroscience | VOL. 30

Evaluating signal-to-noise ratios, loudness, and related measures as indicators of airborne sound insulation
H K Park ... J S Bradley
The Journal of the Acoustical Society of America | VOL. 126
H K Park, et. al.H K Park ... J S Bradley
01 Sep 2009
The Journal of the Acoustical Society of America | VOL. 126

Perception of environmental sounds by experienced cochlear implant patients.
Valeriy Shafiro ... Brian Gygi
Ear & Hearing | VOL. 32
Valeriy Shafiro, et. al.Valeriy Shafiro ... Brian Gygi
01 Jul 2011
Ear & Hearing | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences