IntroductionHuman beings are constantly exposed to complex acoustic environments every day, which even pose challenges for individuals with normal hearing. Speech perception relies not only on fixed elements within the acoustic wave but is also influenced by various factors. These factors include speech intensity, environmental noise, the presence of other speakers, individual specific characteristics, spatial separatios of sound sources, ambient reverberation, and audiovisual cues. The objective of this study is twofold: to determine the auditory capacity of normal hearing individuals to discriminate spoken words in real-life acoustic conditions and perform a phonetic analysis of misunderstood spoken words. Materials and methodsThis is a descriptive observational cross-sectional study involving 20 normal hearing individuals. Verbal audiometry was conducted in an open-field environment, with sounds masked by simulated real-word acoustic environment at various sound intensity levels. To enhance sound emission, 2D visual images related to the sounds were displayed on a television. We analyzed the percentage of correct answers and performed a phonetic analysis of misunderstood spanish bisyllabic words in each environment. Results14 women (70%) and 6 men (30%), with an average age of 26±5,4 years and a mean airway hearing threshold in the right ear of 10,56±3,52dB SPL and in the left ear of 10,12±2,49dB SPL. The percentage of verbal discrimination in the ‘Ocean’ sound environment was 97,2±5,04%, ‘Restaurant’ was 94±4,58%, and ‘Traffic’ was 86,2±9,94% (p=0,000). Regarding the phonetic analysis, the allophones that exhibited statistically significant differences were as follows: [o] (p=0,002) within the group of vocalic phonemes, [n] (p=0,000) of voiced nasal consonants, [ɾ] (p=0,0016) of voiced fricatives, [b] (p=0,000) and [g] (p=0,045) of voiced stops. ConclusionsThe dynamic properties of the acoustic environment can impact the ability of a normal hearing individual to extract information from a voice signal. Our study demonstrates that this ability decreases when the voice signal is masked by one or more simultaneous interfering voices, as observed in a ‘Restaurant’ environment, and when it is masked by a continuous and intense noise environment such as ‘Traffic’. Regarding the phonetic analysis, when the sound environment was composed of continuous-low frequency noise, we found that nasal consonants were particularly challenging to identify. Furthermore, in situations with distracting verbal signals, vowels and vibrating consonants exhibited the worst intelligibility.