Using instrument recognition for melody extraction from polyphonic audio

Jana Eggink,Guy J Brown

doi:10.1121/1.4785800

Abstract

A system is proposed that identifies the solo instrument in accompanied sonatas and concertos, and uses this knowledge to extract the melody line played by this instrument. The approach uses a feature representation based solely on the spectral peaks belonging to the harmonic series of a fundamental frequency (F0). Based on an initial approximate F0 estimation, this representation proved to be sufficient for instrument classification even in the presence of highly unpredictable background accompaniment. Once the solo instrument is known, a more accurate estimation of the melody line is carried out based on so-called melody models, which are trained on instrument-specific training material. In every time frame multiple F0 candidates are extracted and their likelihood is evaluated according to the chosen melody model. Additional temporal constraints take the form of frame-to-frame transition probabilities, and are obtained from the same training material. The two knowledge sources are combined in a statistical search for the overall most likely melody line. When evaluated on realistic recordings of classical sonatas and concertos, the system was able to find the correct F0 in 72% of frames, an improvement of over 20% compared to a simple salience based approach. a)Previously at University of Sheffield.

Full Text