Abstract

Audio fingerprinting methods can compress audio contents into compact signatures so that we can save storage and reduce query time. This technology is widely used in many fields, such as audio retrieval, music information retrieval and audio authentication. However, most of the existing methods cannot balance the recognition accuracy, query speed and storage size well. This letter presents a novel self-supervised learning scheme called asymmetric contrastive learning to generate binary hash fingerprints of audio segments. Meanwhile, we design a new loss function named bidirectional asymmetric pairwise loss to minimize the loss of information. Experimental results show that our scheme can achieve a high top-1 hit rate on both music and speech datasets. Furthermore, the proposed scheme outperforms the previous work of real-value fingerprinting in query speed and storage size.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call