Abstract

Over the past several decades, research in signal enhancement and speech recognition has concentrated on single channels and microphone arrays. Whereas single channels require subjects who are relatively close to the microphone, microphone arrays require close spacing and a priori knowledge of the geometry. In contrast to those stringent conditions, distributed multi-microphones (DMMs) can be utilized for situations that require the microphones that are positioned far away from the subjects with possibly unknown wide-spacing and configurations such as in meeting rooms or the wild. As opposed to performing recognition through microphone selection, feature integration, or likelihood combination, the proposed work focuses on processing the DMM signals to diminish the effects of ambient noise and form one optimal signal before passing it into the recognizer through two methods: weighted sum of distances and weighted sum of signal powers. Song-type classification experiments are presented on eight-channel Norwegian Ortolan Bunting (Emberiza Hortulana) vocalizations over a microphone range of 1 to 206 m from the reference on 1620 recordings using cepstral coefficients, energy, and time derivatives features. Based on the results, the two methods achieve accuracies of 91.2% (signal powers) and 94.4% (distances) against 90.7% (closest microphone) on 804 test exemplars divided across four song types.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.