Abstract

In the context of assisted human, identifying and enhancing non-stationary speech targets speech in various noise environments, such as a cocktail party, is an important issue for real-time speech separation. Previous studies mostly used microphone signal processing to perform target speech separation and analysis, such as feature recognition through a large amount of training data and supervised machine learning. The method was suitable for stationary noise suppression, but relatively limited for non-stationary noise and difficult to meet the real-time processing requirement. In this study, we propose a real-time speech separation method based on an approach that combines an optical camera and a microphone array. The method was divided into two stages. Stage 1 used computer vision technology with the camera to detect and identify interest targets and evaluate source angles and distance. Stage 2 used beamforming technology with microphone array to enhance and separate the target speech sound. The asynchronous update function was utilized to integrate the beamforming control and speech processing to reduce the effect of the processing delay. The experimental results show that the noise reduction in various stationary and non-stationary noise environments were 6.1 dB and 5.2 dB respectively. The response time of speech processing was less than 10ms, which meets the requirements of a real-time system. The proposed method has high potential to be applied in auxiliary listening systems or machine language processing like intelligent personal assistant.

Highlights

  • In recent years, the number of patients suffering from hearing loss has increased year by year.According to the World Health Organization, around 466 million people worldwide have disabling hearing loss, out of which 34 million are children

  • In the process one target voice at a time, but it can achieve the purpose of multi-person communication by system of machine hearing, most of them focus on a single voice target as the source of speech fast switching [33]

  • In the stationary noise experiment, the non-stationary noise was added to the experimental conditions in Figure 8 and the target speech source was placed close to the noise source from the conditions in Figure 8 and the target speech source was placed close to the noise source from the speaker to simulate ordinary daily life scenarios

Read more

Summary

Introduction

The number of patients suffering from hearing loss has increased year by year. According to the World Health Organization, around 466 million people worldwide have disabling hearing loss, out of which 34 million are children. As the population grows and ages, it is estimated that 900 million people will suffer from hearing loss by 2050 [1]. Patients with hearing loss have difficulty in communicating with others, especially when they are in a multi-voice environment, which will impair speech comprehension and cause social isolation. Such medical condition causes declining life quality and several mental conditions [2].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call