Abstract

The adaptive multi-rate (AMR) audio codec adopted by many portable recording devices is widely used in speech compression. The use of AMR speech recordings as evidence in court is growing. Nowadays, it is easy to tamper with digital speech recordings, which makes audio forensics increasingly important. The detection of double compressed audio is one of the key issues in audio forensics. In this paper, we propose a framework for detecting double compressed AMR audio based on the stacked autoencoder (SAE) network and the universal background model—Gaussian mixture model (UBM-GMM). Instead of hand-crafted features, we used the SAE to learn the optimal features automatically from the audio waveforms. Audio frames are used as network input and the last hidden layer’s output constitutes the features of a single frame. For an audio clip with many frames, the features of all the frames are aggregated and classified by UBM-GMM. Experimental results show that our method is effective in distinguishing single/double compressed AMR audio and outperforms the existing methods by achieving a detection accuracy of 98% on the TIMIT database. Exhaustive experiments demonstrate the effectiveness and robustness of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call