Abstract

Audio identification via fingerprint has been an active research field for years. However, most previously reported methods work on the raw audio format in spite of the fact that nowadays compressed format audio, especially MP3 music, has grown into the dominant way to store music on personal computers and/or transmit it over the Internet. It will be interesting if a compressed unknown audio fragment could be directly recognized from the database without decompressing it into the wave format at first. So far, very few algorithms run directly on the compressed domain for music information retrieval, and most of them take advantage of the modified discrete cosine transform coefficients or derived cepstrum and energy type of features. As a first attempt, we propose in this paper utilizing compressed domain auditory Zernike moment adapted from image processing techniques as the key feature to devise a novel robust audio identification algorithm. Such fingerprint exhibits strong robustness, due to its statistically stable nature, against various audio signal distortions such as recompression, noise contamination, echo adding, equalization, band-pass filtering, pitch shifting, and slight time scale modification. Experimental results show that in a music database which is composed of 21,185 MP3 songs, a 10-s long music segment is able to identify its original near-duplicate recording, with average top-5 hit rate up to 90% or above even under severe audio signal distortions.

Highlights

  • As an emerging entertainment fashion, online music business such as listening, downloading, identification, and searching have become one of the hottest applications on the World Wide Web for several years

  • Lie and Su [4] directly used selected modified discrete cosine transform (MDCT) spectral coefficients and derived sub-band energy and its variation to represent the tonic characteristic of a short-term sound and to match between two audio segments

  • Can we develop a new type of compressed domain feature to achieve high robustness in audio fingerprinting? It is well known that Zernike moment has been widely used in many image-related research fields such as image recognition [11], image watermarking [12], human face recognition [13], and image analysis [14] due to its prominent property of strong robustness and rotation, scale, and translation (RST) invariance

Read more

Summary

Introduction

As an emerging entertainment fashion, online music business such as listening, downloading, identification, and searching have become one of the hottest applications on the World Wide Web for several years. Various compressed domain audio features including scale factors [15,16], MP3 window-switching pattern [17,18], basic MDCT coefficients and derived spectral energy, energy variation, duration of energy peaks, amplitude envelope, spectrum centroid, spectrum spread, spectrum flux, roll-off, RMS, rhythmic content like beat histogram [19,20,21,22,23,24] have been used in different applications such as retrieval, segmentation, genre classification, speech/ music discrimination, summarization, singer identification, watermarking, and beat tracing/tempo induction. In spite of the extensive use in various imagerelated research fields for years, to the authors’ knowledge, Zernike moment has not yet been applied to music information retrieval This motivated our initial idea of developing compressed domain Zernike moments for audio fingerprinting technique.

Compressed domain auditory Zernike moment
Compressed domain audio features
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.