This paper proposes a multiple-F0 estimation algorithm for automatic polyphonic music transcription. The proposed algorithm operates at frame level, searching for the set of fundamental frequencies that minimizes a spectral distance measure at each audio frame. The spectral distance measure is defined under the assumption that a polyphonic sound can be modelled by a weighted sum of Gaussian spectral models. Due to the fact that in polyphonic music signals the spectral content at the current audio frame depends to a large extent on the immediately previous ones, the defined spectral distance measure takes into account not only information from the current audio frame but also from some previous ones. An additional performance improvement is achieved by using a Hidden Markov Model (HMM) at the end of the algorithm. The proposed algorithm is tested using real-world polyphonic music recordings taken from the RWC music database. Accuracy rates are reported when our algorithm is performed under different conditions. Classification of the total error into the three categories of errors (substitutions, misses and false alarms) is also reported. Comparison with five recent state-of-the art transcription systems is finally shown.
Read full abstract