Evaluation of scene analysis using real and simulated acoustic mixtures: Lessons learnt from the CHiME speech recognition challenges

Jon P Barker

doi:10.1121/1.4988044

Abstract

Computational auditory scene analysis is increasingly presented in the literature as a set of auditory-inspired techniques for estimating “Ideal Binary Masks” (IBM), i.e., time-frequency domain segregations of the attended source and the acoustic background based on a local signal-to-noise ratio objective (Wang and Brown, 2006). This talk argues that although IBMs may be a useful stand-in when evaluating signal-processing systems, they can provide a misleading perspective when considering models of auditory cognition. First, there is no evidence that human cognition computes or requires an explicit binary mask representation (ideal or otherwise). Second, evaluation of an IBM requires artificially-mixed acoustic scenes in order to provide access to the ground truth mask. It is possible that systems that work well on artificially mixed acoustic scenes will fail to generalize to real data. The danger in predicting real performance from results obtained on artificial mixtures is seen in an analysis of systems submitted to the recent CHiME distant microphone speech recognition challenges which evaluates on both types of data (http://spandh.dcs.shef.ac.uk/chime). It is argued that rather than presume specific internal representations, auditory scene analysis systems can be best evaluated by direct comparison of human and machine percepts, e.g., in the case of a speech recognition task, comparison of human and machine transcriptions at a phonetic level.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluation of scene analysis using real and simulated acoustic mixtures: Lessons learnt from the CHiME speech recognition challenges

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Journal: The Journal of the Acoustical Society of America	Publication Date: May 1, 2017
Citations: 1

Similar Papers

Review of Ideal Binary and Ratio Mask Estimation Techniques for Monaural Speech Separation
T M Minipriya ... R Rajavel
-
T M Minipriya, et. al.T M Minipriya ... R Rajavel
01 Feb 2018
01 Feb 2018

Speech intelligibility in background noise with ideal binary time-frequency masking
Deliang Wang ... Michael S Pedersen
The Journal of the Acoustical Society of America | VOL. 125
Deliang Wang, et. al.Deliang Wang ... Michael S Pedersen
01 Apr 2009
The Journal of the Acoustical Society of America | VOL. 125

Feature Enhancement Based on CASA for Robust Speech Recognition
...
-
, et. al. ...
06 Apr 2012
06 Apr 2012

On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis
Christopher Hummersone ... Tim Brookes
-
Christopher Hummersone, et. al.Christopher Hummersone ... Tim Brookes
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of scene analysis using real and simulated acoustic mixtures: Lessons learnt from the CHiME speech recognition challenges

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America