Abstract

AbstractAudio is a very commonly used media for the transfer of information. However, certain problems like audio copyright detection and music identification calls for the need of a general audio identification system. A major barrier to the development of such a system is searching of the source audio from a huge database of audio tracks using only a very short and noisy input audio. Thus, the imperative objective being the reduction of this highly complex problem of searching, resulted in the concept of audio fingerprinting. Inspired from a human fingerprint, an audio fingerprint can be defined as a content-based compact signature of an audio track that essentially summarizes the respective track. This technology has attracted huge attention because of its independent formatting and the non-requirement of any meta-data or watermark embedding. The proposed methodology has adopted a two-staged feature-extraction-based approach to develop an audio fingerprint, which is referred to as a database of audio tracks to search for the source audio. Here, two high-level features, Mel-Spectrograms and Mel-Frequency Cepstral Coefficients, extracted from a vanilla spectrogram, have been used as the basic audio features, followed by an advanced feature extraction algorithm, which treats the two basic features separately to ultimately generate the combined fingerprint. The proposed methodology has been analyzed with different hyperparameters and, using standard and sample datasets to generate the optimal results.KeywordsAudio fingerprintingFeature extractionMel-spectrogramMel-frequency cepstral coefficients (MFCCs)Fast fourier transform (FFT)Short-time fourier transform (STFT)Discrete cosine transform (DCT)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call