Abstract

Pitch estimation is an essential task in audio processing due to its key role in many speech and music applications. Still, accurately predicting a continuous value from a high range of pitch frequencies is a challenging task. Inspired by the success of signal processing filterbank methods, we propose a novel deep architecture for accurate pitch estimation. The proposed method is composed of an encoder and multiple decoders. The encoder is implemented by a convolutional neural network that provides a good representation of the raw audio signal, and its output is fed into a set of decoders. Each decoder predicts the pitch value within a specific frequency band and is implemented by a fully-connected neural network. Such a construction allows each decoder to specialize in a particular frequency regime, which turns into a more accurate estimation of pitch values for music and speech signals.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call