Recording speech is often a matter of proper experimental design to obtain the best signal quality. This notably includes appropriate microphone positioning and gain adjustment. However, some radio communication devices require that the microphone be placed in a very close proximity to the mouth, therefore capturing the respiratory airflow, as well as the air bursts from plosive consonants. In this paper, we assess an objective evaluation method for the quality of speech signals captured in such adverse conditions. It employs an artificial mouth coupled with a steady airflow to compute the Speech Transmission Index (STI). We apply this method to the evaluation of four commercially available pop filters. The addition of a 30~L/min airflow reduces the STI with the microphone alone from excellent to fair. With soft fabric filters, we gain back the original STI, while it remains very slightly degraded with a metal mesh filter. The objective metric is finally confronted with a listening test. The perceived quality of simulated samples with the added airflow is similar to the STI results. However, no significant difference between the filters is found with real speech samples containing plosive consonants and no respiration noise.
Read full abstract