Abstract

Watermarking is an important measure for protecting proprietary digital multimedia data. This paper presents a novel approach to achieving robust and imperceptible blind speech watermarking on a frame-by-frame basis. The proposed method employs two modules operating in the fast Fourier transform (FFT) domain. The first module is referred to as downward progressive quantization index modulation. It modulates the vector norms drawn from FFT coefficients according to a guideline deduced from human auditory masking properties. The second module is referred to as boundary-constrained iterative adjustment. It provides a smooth transition across frames in the resulting speech waveform. Experiment results confirm the imperceptibility of the proposed modulation scheme in terms of the mean opinion score - listening quality objective (MOS-LQO) based on the perceptual evaluation of speech quality (PESQ) metric. The proposed watermarking method matched and exceeded the performance of five state-of-the-art methods in terms of robustness against common speech processing attacks.

Highlights

  • Rapid advances in computing and communication technology have made it easier than ever to access multimedia data via the internet

  • We developed an fast Fourier transform (FFT)-based speech watermarking method, which exploits auditory masking to improve the balance between robustness and imperceptibility

  • FFT-BASED SPEECH WATERMARKING We developed a two-phase speech watermarking method in the FFT domain, referred to as perceptual vector norm modulation (PVNM)

Read more

Summary

INTRODUCTION

Rapid advances in computing and communication technology have made it easier than ever to access multimedia data via the internet. The most common approach to protecting online multimedia data is digital watermarking, which involves embedding confidential information pertaining to ownership within the media file. Hu et al.: Robust Blind Speech Watermarking via FFT-Based PVNM With Frame Self-Synchronization. The discrete wavelet transform (DWT), which captures both frequency and location information, has long been a standard approach in speech/audio watermarking [4], [13], [14]. Saadi et al [15] performed norm-space watermarking in a hybrid domain formed by the DWT and DCT in tandem Their method provides a good tradeoff between imperceptibility and robustness. We developed an FFT-based speech watermarking method, which exploits auditory masking to improve the balance between robustness and imperceptibility.

PROTOCOL FOR ARRANGEMENT OF INFORMATION BITS
PERCEPTUAL CONSIDERATIONS IN DETERMINING
PERFORMANCE EVALUATION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call