Abstract

This paper proposes a detection algorithm for each section which detects the voiced section, unvoiced section, and the silence section at each frame using a multi-layer perceptron neural network. First, a power spectrum and FFT (fast Fourier transform) coefficients obtained by FFT are used as the input to the neural network for each frame, then the neural network is trained using these power spectrum and FFT coefficients. In this experiment, the performance of the proposed algorithm for detection of the voiced section, unvoiced section, and silence section was evaluated based on the detection rates using various speeches, which are degraded by white noise and used as the input data of the neural network. In this experiment, the detection rates were 92% or more for such speech and white noise when training data and evaluation data were the different.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call