Pitch class profile (PCP), which can represent the harmonic progression of a piece of music very well, is one of the widely used audio features for cover version identification. In this letter, we describe a novel procedure that enhances PCP by substantially boosting the degree of instrumental accompaniment invariance without degrading the feature’s discriminative power. Our idea is based on the assumption that human ear can identify a cover of a pop song based on their singing voice quickly and easily. So, we combine two concepts from psychoacoustics: (i) time-varying loudness contour and (ii) critical band, which have been used in speech recognition field successfully, with the conventional PCP descriptor to enhance its discriminative power. Since the CPCPs aim at a representation of singing voice, they may also obtain improved performance (as compared to conventional PCPs) when applied to a cappella singing recordings. Experimental results demonstrate that the resulting PCP feature, called cochlear pitch class profile (CPCP), outperforms conventional PCP feature in the context of pop cover song identification application.
Read full abstract