Abstract

At present, speech recognition and interaction are a current research hotspot of artificial intelligence, and the research of speech separation algorithm is the top priority of speech recognition and interaction. In the process of voice interaction, it is first necessary to ensure that the target voice can be accurately recognized in a scene containing noise and multiple speakers, and then it can be converted into text information. Therefore, it is still a problem to be solved if the voice of a specific speaker is separated and enhanced, and then recognized in a scene with multiple speakers or noise. This paper proposes a time-domain speech separation and recognition algorithm based on deep learning; firstly, the target speech is separated and enhanced by a time-domain audio separation network based on convolution, and finally the target speech is recognized. The experimental results show that using the time-domain speech separation algorithm based on deep learning in this paper, and then speech recognition, greatly improve the signal-to-noise ratio and recognition accuracy of the baseline model, and the training of the model can also converge quickly. Voice is effectively recognized.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call