Abstract

Automatic speech segmentation as an important part of speech recognition system (ASR) is highly noise dependent. Noise is made by changes in the communication channel, background, level of speaking etc. In recent years, many researchers have proposed noise cancelation techniques and have added visual features from speaker's face to reduce the effect of noise on ASR systems. Removing noise from audio signals depends on the type of the noise; so it cannot be used as a general solution. Adding visual features improve this lack of efficiency, but advanced methods of this type need manual extraction of visual features. In this paper we propose a completely automatic system which uses optical flow vectors from speaker's image sequence to obtain visual features. Then, Hidden Markov Models are trained to segment audio signals from image sequences and audio features based on extracted optical flow. The developed segmentation system based on such method acts totally automatic and become more robust to noise.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.