Abstract

Abstract The role of the statistical model-based voice activity detector (SMVAD) is to detect speech regions from input signals using the statistical models of noise and noisy speech. The decision rule of SMVAD is based on the likelihood ratio test (LRT). The LRT-based decision rule may cause detection errors because of statistical properties of noise and speech signals. In this article, we first analyze the reasons why the detection errors occur and then propose two modified decision rules using reliable likelihood ratios (LRs). We also propose an effective weighting scheme considering spectral characteristics of noise and speech signals. In the experiments proposed in this study, with almost no additional computations, the proposed methods show significant performance improvement in various noise conditions. Experimental results also show that the proposed weighting scheme provides additional performance improvement over the two proposed SMVADs.

Highlights

  • The purpose of a voice activity detector (VAD) is to discriminate between speech and non-speech regions from the input signals in various noisy conditions

  • Statistical model-based voice activity detector we briefly review the overall process of statistical model-based VAD (SMVAD) using the complex Gaussian probability density function (PDF) to detect speech regions in the adverse noise environment

  • Weighting scheme considering reliability of likelihood ratios (LRs) With the analysis of likelihood ratio test (LRT)-based decision rule, we propose a weighting scheme to reflect the reliability of each likelihood ratios (LLRs)

Read more

Summary

Introduction

The purpose of a voice activity detector (VAD) is to discriminate between speech and non-speech regions from the input signals in various noisy conditions. Even though the a priori SNR is estimated to be high in this speech frame, it causes low LLR values when the input signal power is lower than the estimated noise variance. The a posteriori SNRs in low-powered frequency region can be high and possibly make certain LLR values as high as the levels in the speech frame Using these LLRs in the decision rule, some of these noise frames can be detected as speech frames. On the contrary, based on our analysis, if we select or properly weight the LLRs for the decision rule, we can detect the speech frame because all of LLRs in low-powered region in Figure 6b can be excluded from the decision or reduced by the proper weights which can attenuate the effects of unreliable LLRs

Modified decision rules
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.