Natural soundscapes of everyday life, for example, communication in a crowded get-together or noisy environment, challenge our proficiency in organizing sounds into perceptually meaningful sequences. All the more, music might spark our processing capabilities, as it provides acoustic scenes with a large number of concurring sound sources. Yet, when listening to music we are able to organize the complex soundscape into streams, segregate foreground and background, recognize voices, melodies, patterns, motifs, and switch our attention between different aspects of a piece of music. Auditory stream segregation, the perceptional process which underlies this capability, has fascinated researchers for many years, resulting in numerous studies exploring its mechanisms and determinants. In a nutshell (for a detailed review, see Moore & Gockel, 2002), the segregation of a complex audio signal into streams can occur on the basis of many different acoustic cues (Van Noorden, 1975); it is assumed to rely on processes at multiple levels of the auditory system, and it reflects a number of different processes, some of which are stimulus-driven while others are of more general cognitive nature, that is, involving attention and/or knowledge (Bregman, 1994).Electrophysiological indices of auditory stream segregation have been detected in several approaches (Sussman, 2005; Sussman, Horvath, Winkler, & Orr, 2007; Winkler, Takegata, & Sussman, 2005; Yabe et al., 2001; for an overview, see Snyder & Alain, 2007). One line of research focused on the mismatch negativity (MMN) as neural index for a distinct perceptional state of stream segregation by constructing tone sequences such that only a perceptual segregation into two streams would allow a MMNgenerating sound pattern to emerge. Following a similar principle, neural steady-state responses were found to reflect the formation of separate streams (Chakalov, Draganova, Wollbrink, Preissl, & Pantev, 2013) in magnetoencephalography (MEG). Using electroencephalogram (EEG), an influence of frequency separation of consecutive tones on the N1-P2 complex amplitudes was reported (Gutschalk et al., 2005; Snyder, Alain, & Picton, 2006). Critically, this trend correlated with the perception of streaming in individual participants; a similar effect was reported for the N1 component.This suggests that the amplitude of early auditory event-related potential (ERP) components like the N1-P2 complex can inform about the perceptional state with respect to segregation/coherence of complex auditory stimuli. Because the N1-P2 complex as a sensory-obligatory auditory-evoked potential can be utilized without imposing a complex structure, for example, an oddball paradigm, on the stimulus material, it may be promising for investigating auditory stream segregation in more naturalistic listening scenarios.In the domain of speech processing, cortical onset responses that reflect changes in the waveform envelope (termed envelopefollowing responses [EFRs]) have been a target of interest for a long time (Kuwada, Batra, & Maher, 1986; Purcell, John, Schneider, & Picton, 2004; Aiken & Picton, 2006). Several approaches and methods aiming at extracting EFRs in naturalistic listening scenarios from continuous EEG or MEG have been proposed (Aiken & Picton, 2008; Kerlin, Shahin, & Miller, 2010; Lalor, Power, Reilly, & Foxe, 2009; Lalor & Foxe, 2010; O'Sullivan, 2015). These methods have provided a distinct picture of the brain signals following the speech waveform envelope and, in particular, been utilized to study the human cocktail party problem of understanding speech in noisy settings. In the domain of music processing a marked reflection of the sound envelope has been detected in the EEG signal of short segments of naturalistic music (Schaefer, Farquhar, Blokland, Sadakata, & Desain, 2011). Unsupervised approaches (Cong et al., 2012; Thompson, 2013) have confirmed that note onsets leave a reflection in the listener's EEG consistently across subjects and stimuli. …
Read full abstract