Evaluating speech intelligibility across reverberant conditions in university dining halls
Background noise and reverberation can impede speech intelligibility. The clarity of speech holds particular significance in communal spaces like dining halls. Nevertheless, such venues often exhibit inadequate acoustic configurations for interactive dialog, characterized by high levels of background noise and reverberation. The present investigation examines how well individuals comprehend speech combined with noise within four campus dining halls at the University of Illinois, Urbana-Champaign. A speech intelligibility test was implemented in the four dining halls using binaural room impulse responses (BRIRs) captured through two Head and Torso Simulators (HATS). These BRIRs were then convolved with the recorded speech stimuli (QuickSIN) to replicate real listening conditions. While it is well established that lower reverberation times improve speech intelligibility, this study shows how targeted acoustic interventions can enhance communication within university dining halls.
- Research Article
1
- 10.5075/epfl-thesis-4643
- Jan 1, 2010
Binaural room impulse responses (BRIRs) characterize the transfer of sound from a source in a room to the left and right ear entrances of a listener. Applying BRIRs to sound source signals enables headphone listening with the perception of a three dimensional auditory image. BRIRs are usually linear filters of several hundred milliseconds to several seconds length. The waveforms of the BRIRs contain therefore a vast amount of information. This thesis studies the modeling of BRIRs with a reduced set of parameters. It is shown that late BRIR tails can be modeled perceptually accurately by considering only the time-frequency energy decay relief and frequency dependent interaural coherence (IC). This insight on BRIR modeling enables a number of algorithms with advantages over the previous state of the art. Three such algorithms are proposed: The first algorithm makes it possible to obtain BRIRs by measuring room properties and listener properties separately, vastly reducing the number of measurements necessary to measure listener-specific BRIRs for a number of listeners and rooms. The listener properties are measured as a head related transfer function (HRTF) set and the room properties are measured as a B-format1 room impulse response (RIR). It is shown how to combine the HRTF set of the listener with a B-format RIR to obtain BRIRs for that room individualized for the listener. This technique uses the insight on BRIR perception by computing the BRIR tail as a frequency dependent, linear combination of B-format channels, designed to obtain the desired energy decay relief and interaural coherence. A serious problem related to convolving sound source signals with BRIRs is the computational complexity of implementing long BRIRs as finite impulse response (FIR) filters. Inspired by the perceptual experiments on BRIR tails, a modified Jot reverberator is proposed, simulating BRIR tails with the desired frequency dependent interaural coherence, requiring significantly less computational power than direct application of BRIRs. Also inspired by the perception of BRIRs, an extension of this reverberator is proposed, modeling efficiently the reverberation tail with the correct coherence and also distinct early reflections using two parallel feedback delay networks. If stereo signals are played back using headphones, unnatural binaural cues are given to the listener, e.g. interaural level difference (ILD) changes not accompanied by corresponding interaural time difference (ITD) changes or diffuse sound with unnatural IC. In order to simulate stereo listening in a room and to avoid these unnatural cues, BRIRs can be applied to the left and right stereo channels. Besides the computational complexity associated with applying the BRIR filters, this technique has a number of disadvantages. The room associated with the used BRIRs is imposed on the stereo signal, which usually already contains reverberation and applying BRIRs leads to a change in reverberation time and to coloration. A technique is proposed in which the direct sound is rendered using data extracted from HRTFs and the ambient sound contained in the stereo signal is modified such that its coherence is matched to the coherence of a binaural recording of diffuse sound, without modifying its spectrum. Implementations of reverberators based on general feedback-delay networks (e.g. Jot reverberators) can require a high number of operations for implementing the so-called feedback matrix. For certain applications where the number of channels needs to be high, such as decorrelators, this can pose a real problem. Special types of matrices are known which can be implemented efficiently due to matrix elements having the same magnitude. However, the complexity can also be reduced by introducing many zero elements. Different types of such sparse feedback matrices are proposed and tested for their suitability in Jot reverberators. A highly efficient feedback matrix is obtained by combining both approaches, choosing the nonzero elements of a sparse matrix from efficiently implementable Hadamard matrices. ______________________________ 1 B-format refers to a 4-channel signal recorded with four coincident microphones: one omni and three dipole microphones pointing in orthogonal directions.
- Conference Article
4
- 10.25836/sasp.2019.31
- Sep 6, 2019
- HAL (Le Centre pour la Communication Scientifique Directe)
A basic building block in audio for Augmented Reality (AR) is the use of virtual sound sources layered on top of any real sources present in an environment. In order to perceive these virtual sources as belonging to the natural scene it is important to match their acoustic parameters to those of a real source with the same characteristics, i.e. radiation properties, sound propagation and head-related impulse response (HRIR). However, it is still unclear to what extent these parameters need to be matched in order to generate plausible scenes in which virtual sound sources blend seamlessly with real sound sources. This contribution presents an auralization framework that allows protyping of augmented reality scenarios from measured multichannel room impulse responses to get a better understanding of the relevance of individual acoustic parameters.A well-established approach for binaural measurement and reproduction of sound scenes is based on capturing binaural room impulse responses (BRIR) using a head and torso simulator (HATS) and convolving these BRIRs dynamically with audio content according to the listener head orientation. However, such measurements are laborious and time consuming, requiring measuring the scene with the HATS in multiple orientations. Additionally, the HATS HRIR is inherently encoded in the BRIRs, making them unsuitable for personalization for different listeners. The approach presented here consists of the resynthesis and dynamic binaural reproduction of multichannel room impulse responses (RIR) using an arbitrary HRIR dataset. Using a compact microphone array, we obtained a pressure RIR and a set of auxiliary RIRs, and we applied the Spatial Decomposition Method (SDM) to estimate the direction-of-arrival (DOA) of the different sound events in the RIR. The DOA information was used to map sound pressure to different locations by means of an HRIR dataset, generating a binaural room impulse response (BRIR) for a specific orientation. By either rotating the DOA or the HRIR data set, BRIRs for any direction may be obtained. Auralizations using SDM are known to whiten the spectrum of late reverberation. Available alternatives such as time-frequency equalization were not feasible in this case, as a different time-frequency filter would be necessary for each direction, resulting in a non-homogeneous equalization of the BRIRs. Instead, the resynthesized BRIRs were decomposed into sub-bands and the decay slope of each sub-band was modified independently to match the reverberation time of the original pressure RIR. In this way we could apply the same reverberation correction factor to all BRIRs. In addition, we used a direction independent equalization to correct for timbral effects of equipment, HRIR, and signal processing. Real-time reproduction was achieved by means of a custom Max/MSP patch, in which the direct sound, early reflections and late reverberation were convolved separately to allow real-time changes in the time-energy properties of the BRIRs. The mixing time of the reproduced BRIRs is configurable and a single direction independent reverberation tail is used. To evaluate the quality of the resynthesis method in a real room, we conducted both objective and perceptual comparisons for a variety of source positions. The objective analysis was performed by comparing real measurements of a KEMAR mannequin with the resynthesis at the same receiver location using a simulated KEMAR HRIR. Typical room acoustic parameters of both real and resynthsized acoustics were found to be in good agreement. The perceptual validation consisted of a comparison of a loudspeaker and its resynthesized counterpart. Non-occluding headphones with individual equalization were used to ensure that listeners were able to simultaneously listen to the real and the virtual samples. Subjects were allowed to listen to the sounds for as long as they desired and freely switch between the real and virtual stimuli in real time. The integration of an Optitrack motion tracking system allowed us to present world-locked audio, accounting for head rotations.We present here the results of this listening test (N = 14) with three sections: discrimination, identification, and qualitative ratings. Preliminary analysis revealed that in these conditions listeners were generally able to discriminate between real and virtual sources and were able to consistently identify which of the presented sources was real and which was virtual. The qualitative analysis revealed that timbral differences are the most prominent cues for discrimination and identification, while spatial cues are well preserved. All the listeners reported good externalization of the binaural audio.Future work includes extending the presented validation to more environments, as well as implementing tools to arbitrarily modify BRIRs in the spatial, temporal, and frequency domains in order to study the perceptual requirements of room acoustics reproduction in AR.
- Research Article
24
- 10.1016/j.bbr.2003.12.006
- Feb 1, 2004
- Behavioural Brain Research
Background noise does not modify song-induced genic activation in the bird brain
- Research Article
- 10.1121/1.4785520
- Oct 1, 2004
- The Journal of the Acoustical Society of America
Computer model studies were used to predict qualitative and quantitative measures of speech intelligibility in classrooms under realistic conditions of background noise and reverberation. Fifteen different acoustical measurements related to speech intelligibility were made at multiple locations in three actual classrooms and in computer models of the classrooms. Speech intelligibility (MRT) tests were given to human subjects in each of the actual classrooms at five signal-to-noise ratios. Speech intelligibility tests were also prepared from aural simulations obtained by convolving anechoic speech tracts with impulse responses obtained in the computer models. Correlations (R2) between acoustical measures made in the full size classrooms and the computer models of the classrooms of 0.92 to 0.99 with standard errors of 0.033 to 7.311 were found. The scores on the speech intelligibility tests given in the actual rooms in the five noise conditions were closely duplicated in the equivalent tests conducted in a sound booth using the simulated speech signals obtained in the computer models. Both quantitative and qualitative measures of speech intelligibility in the actual rooms were accurately predicted in the computer models.
- Research Article
28
- 10.1016/j.apacoust.2010.11.012
- Dec 21, 2010
- Applied Acoustics
Perceptual validation of virtual room acoustics: Sound localisation and speech understanding
- Research Article
1
- 10.1121/10.0005756
- Aug 1, 2021
- The Journal of the Acoustical Society of America
Much can be learned by investigating the click trains of odontocetes, including estimating the number of vocalizing animals and comparing the acoustic behavior of different individuals. Analyzing such information gathered from groups of echolocating animals in a natural environment is complicated by two main factors: overlapping echolocation produced by multiple animals at the same time, and varying levels of background noise. Starkhammar et al. [(2011a). Biol. Lett. 7(6), 836-839] described an algorithm that measures and compares the frequency spectra of individual clicks to identify groups of clicks produced by different individuals. This study presents an update to this click group separation algorithm that improves performance by comparing multiple click characteristics. There is a focus on reducing error when high background noise levels cause false click detection and recordings are of a limited frequency bandwidth, making the method applicable to a wide range of existing datasets. This method was successfully tested on recordings of free-swimming foraging dolphins with both low and high natural background noise levels. The algorithm can be adjusted via user-set parameters for application to recordings with varying sampling parameters and to species of varying click characteristics, allowing for estimates of the number of echolocating animals in free-swimming groups.
- Research Article
- 10.18453/rosdok_id00002434
- Jan 1, 2019
The practical measurement of binaural room impulse responses (BRIRs) is impaired by equipment and background noise. The noise level in the BRIR depends amongst others on the duration of the excitation signal. With the increasing use of fast measurement techniques, e.g. for the measurement of individual BRIRs, the question arises which level of noise is perceptually acceptable. This paper presents a perceptual study on the threshold of detecting noise in BRIRs. As reference, a set of BRIRs is measured resulting in a high peak-to-noise ratio (PNR) by long-term averaging. In order to generate the stimuli, these BRIRs are impaired by additive white Gaussian noise of different levels. The BRIRs are convolved with a speech stimulus and presented in a 3AFC listening test with a 2Up-1Down rule to the subjects. The results of perceptual experiment are statistically analysed and discussed.
- Research Article
64
- 10.1016/j.cub.2017.09.014
- Oct 19, 2017
- Current Biology
Audiomotor Perceptual Training Enhances Speech Intelligibility in Background Noise
- Research Article
- 10.1121/10.0023135
- Oct 1, 2023
- The Journal of the Acoustical Society of America
Clear and effective communication is crucial for the safe operation of aircraft. In helicopters, high levels of noise are generated by the engine, gears, and aerodynamics, which negatively impact speech intelligibility. To address this issue, modern aircraft headsets utilize active noise control (ANC) to reduce noise levels for both the crew and passengers. However, the speech signals captured by these headsets often contain high levels of background noise, thereby hindering internal and external flight communication. This paper introduces a dual microphone dual stage speech enhancement algorithm that combines basic spectral subtraction with a Wiener Filter, enhanced by the a priori and a posteriori signal-to-noise ratio. Audio data from within a helicopter cabin were recorded during a test flight. In a series of simulations, the Wiener Filter implementation is compared to other algorithms based only on spectral subtraction methods. The results are evaluated using established performance measures for speech quality. The Wiener Filter implementation results in the highest speech quality and is, therefore, implemented on an FPGA-platform for validation in a laboratory experiment. The simulations and measurements demonstrate significant improvements in speech quality and, consequently, enhance speech intelligibility using the proposed method.
- Research Article
- 10.1121/1.4782128
- May 1, 2007
- The Journal of the Acoustical Society of America
Classrooms in India, especially those located in warm humid areas, are generally equipped with mechanical ventilation such as fans or air-conditioning systems which influence the background noise of these classrooms. These systems obviously lead to a high level of background noise even during unoccupied conditions. Ventilation requirements become acute in summer and rooms are provided with a lot of open windows. This will be a problem if there are high levels of external noise. An acoustical analysis has been carried out to assess the acoustical quality and conditions for speech communication. Speech intelligibility tests along with physical and acoustical measurements were made in various classrooms (IIT Madras campus). Octave band measurements of background noise levels and reverberation times were made. The measured values have been compared with a few existing standards such as ANSI. Preliminary surveys have been administered to students and instructors to get a subjective opinion of their experiences in these environments. The classrooms consisted of untreated, partially treated, and fully treated rooms. The inferences from this study have been drawn to suggest better design procedures for such classroom environments.
- Research Article
3
- 10.1016/j.apacoust.2023.109517
- Jul 5, 2023
- Applied Acoustics
The effect of listener head orientation on front-to-rear speech intelligibility in an automotive cabin
- Research Article
12
- 10.1038/s41598-022-10414-6
- Apr 21, 2022
- Scientific Reports
Dining establishments are an essential part of the social experience. However, they are often characterized by high levels of background noise, which represents a barrier to effective communication. This particularly affects people suffering from hearing problems. Moreover, noise level exceeding normal conversational levels causes a phenomenon called the Lombard Effect, an involuntary tendency to increase the amount of vocal effort when talking in the presence of noise. Adults over 60 years represent the second largest population in the US and the majority of them suffer from some degree of hearing loss. The primary aim of the current study was to understand the effect of noise on vocal effort and speech intelligibility in a restaurant setting for adults over 60 years old with and without hearing loss. The secondary aim was to evaluate their perception of disturbance in communication and their willingness to spend time and money in a restaurant was affected by the varying levels of background noise. The results of this study showed background noise levels lower than 50 dB(A) will allow senior customers to minimize their vocal effort and to maximize their understanding of conversations, even for those with moderate to severe hearing loss. By setting a limit, it will also keep perceived disturbance low and willingness to spend time and money high among dining patrons.
- Research Article
- 10.1121/1.4781405
- Nov 1, 2006
- The Journal of the Acoustical Society of America
Several studies have found that high background noise levels are detrimental to health parameters. In particular, this seems to apply to developing voices were future vocal habits are established. Thus, it is important to study vocal function and environmental effects on the developing child voice. This study analyzed the effects of background noise on children’s voices, specifically vocal intensity, and fundamental frequency. The investigated vocal parameters were (1) the relationship of background noise levels to F0 and vocal intensity, (2) F0 and vocal intensity variations over the day, and (3) F0 perturbation variations over the day. Ten 5-year-old children from three day-cares participated, six boys and four girls. The audio signal was recorded by two microphones mounted in front of the subjects’ ears. By adding these signals it is possible to separate the voice from background noise. The material analyzed contained data from three 60-min recordings per child from morning, noon, and afternoon during a normal day at the day-care. Generally high mean background noise levels were found (82.6 dBA). Preliminary results suggest a correlation between high background noise and high F0 and vocal intensity in the children’s voices, particularly for boys. F0 perturbation tends to increase during the day.
- Research Article
11
- 10.3766/jaaa.25.6.10
- Jun 1, 2014
- Journal of the American Academy of Audiology
The acceptable noise level (ANL) test is the only test that is known to predict success with hearing aids with a high degree of accuracy. A person's ANL is the maximal amount of background noise that he or she is "willing to put up with" while listening to running speech. It is defined as the speech level minus the noise level, in decibels (dB). People who are willing to put up with high levels of background noise are generally successful hearing-aid wearers, whereas people who are not willing to put up with high levels of background noise are generally unsuccessful hearing-aid wearers. If it were known what cues that listeners are using to decide how much background noise they are willing to tolerate, then it might be possible to create technology that reduces these cues and improves listeners' chances of success with hearing aids. As a first step toward this goal, this study investigated whether listeners are using loudness as a cue to determine their ANLs. Research Design and Study Sample: Twenty-one individuals with normal hearing and 21 individuals with sensorineural hearing loss participated in this study. In each group of 21 participants, 7 had a low ANL (<7 dB), 7 had a mid ANL (7-13 dB), and 7 had a high ANL (>13 dB). Participants performed a modified version of the ANL in which the speech was fixed at four different levels (50, 63, 75 and 88 dBA), and participants adjusted the background noise (multitalker babble) to the maximal level at which they were willing to listen while following the speech. These results were compared with participants' equal-loudness contours for the multitalker babble in the presence of speech. Equal-loudness contours were measured by having the participants perform a loudness-matching task in which they matched the level of the background noise (multitalker babble), played concurrently with speech, to a reference condition (also multitalker babble). During the test condition, the speech played at 50, 63, 75, or 88 dBA. All testing was performed in a sound booth with the speech and the noise presented from a loudspeaker at a 0° azimuth, 3 feet in front of the participant. Each condition was presented multiple times, and the results were averaged. Presentation order was randomized. Participants were tested unaided. Participants' ANLs were compared with their equal-loudness contours for the background noise. ANLs that ran parallel to the equal-loudness contours were considered consistent with a loudness-based listening strategy. This pattern was observed for only two participants - both hearing-impaired. The majority of listeners showed no consistent trend between their ANLs and their loudness-matched data, suggesting that they are using cues other than loudness to determine their ANLs. ANLs were consistent with loudness-matched data for a small subset of listeners, suggesting that they may be using loudness as a cue for determining their ANLs.
- Research Article
1
- 10.1121/1.4787379
- Apr 1, 2005
- The Journal of the Acoustical Society of America
Auralizations and other computer model studies were used to predict qualitative and quantitative measures of speech intelligibility in classrooms under realistic conditions of background noise and reverberation. Speech intelligibility tests were given to college students in two classrooms and one racquetball court at 5 signal-to-noise ratios. Auralizations of the speech intelligibility tests were made from computer. Speech intelligibility tests were then administered in a sound booth using the auralized material. Fifteen different acoustical measurements related to speech intelligibility were also made at multiple locations in the actual classrooms and in the computer models of the classrooms. The scores on the speech intelligibility tests given in the actual rooms in the five noise conditions were closely duplicated in the equivalent tests conducted in a sound booth using the simulated speech signals obtained in the computer models. Both quantitative and qualitative measures of speech intelligibility in the actual rooms were accurately predicted in the computer models. Correlations (R2) between acoustical measures made in the full size classrooms and the computer models of the classrooms of 0.92 to 0.99 were found.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.