Musicians Show Improved Speech Segregation in Competitive, Multi-Talker Cocktail Party Scenarios.

Gavin M Bidelman,Jessica Yoo

doi:10.3389/fpsyg.2020.01927

Gavin M Bidelman, Jessica Yoo

Open Access

https://doi.org/10.3389/fpsyg.2020.01927

Copy DOI

Abstract

Studies suggest that long-term music experience enhances the brain’s ability to segregate speech from noise. Musicians’ “speech-in-noise (SIN) benefit” is based largely on perception from simple figure-ground tasks rather than competitive, multi-talker scenarios that offer realistic spatial cues for segregation and engage binaural processing. We aimed to investigate whether musicians show perceptual advantages in cocktail party speech segregation in a competitive, multi-talker environment. We used the coordinate response measure (CRM) paradigm to measure speech recognition and localization performance in musicians vs. non-musicians in a simulated 3D cocktail party environment conducted in an anechoic chamber. Speech was delivered through a 16-channel speaker array distributed around the horizontal soundfield surrounding the listener. Participants recalled the color, number, and perceived location of target callsign sentences. We manipulated task difficulty by varying the number of additional maskers presented at other spatial locations in the horizontal soundfield (0–1–2–3–4–6–8 multi-talkers). Musicians obtained faster and better speech recognition amidst up to around eight simultaneous talkers and showed less noise-related decline in performance with increasing interferers than their non-musician peers. Correlations revealed associations between listeners’ years of musical training and CRM recognition and working memory. However, better working memory correlated with better speech streaming. Basic (QuickSIN) but not more complex (speech streaming) SIN processing was still predicted by music training after controlling for working memory. Our findings confirm a relationship between musicianship and naturalistic cocktail party speech streaming but also suggest that cognitive factors at least partially drive musicians’ SIN advantage.

Highlights

In naturalistic sound environments, the auditory system must extract target speech and simultaneously filter out extraneous sounds for effective communication – the classic “cocktailparty problem” (Cherry, 1953; Bregman, 1978; Yost, 1997)
We found a group × masker interaction on target speech recognition accuracy [F(5, 130) = 4.48, p = 0.0008, η2p = 0.15; Figure 2A]. This interaction was attributable to the change in performance from zero to eight talkers being shallower in musicians compared to nonmusicians (Figure 2A, inset; t26 = 3.84, p = 0.0007, d = 1.45). This suggests that musicians were less challenged by cocktail party speech recognition with an increasing number of interfering talkers
By measuring speech recognition in a multi-talker soundscape, we show that trained musicians are superior to their nonmusician peers in deciphering speech within a naturalistic cocktail party environment

Summary

Introduction

The auditory system must extract target speech and simultaneously filter out extraneous sounds for effective communication – the classic “cocktailparty problem” (Cherry, 1953; Bregman, 1978; Yost, 1997). Musically savvy individuals are highly sensitive to changes in auditory space (Munte et al, 2001) and tracking voice pitch (Bidelman et al, 2011) and are better than their non-musician peers at detecting inharmonicity in sound mixtures (Zendel and Alain, 2009). These features are prominent cues that signal the presence of multiple acoustic sources (Popham et al, 2018), and musicians excel at these skills

Objectives

Methods

Results