Abstract
Watermarking is an important measure for protecting proprietary digital multimedia data. This paper presents a novel approach to achieving robust and imperceptible blind speech watermarking on a frame-by-frame basis. The proposed method employs two modules operating in the fast Fourier transform (FFT) domain. The first module is referred to as downward progressive quantization index modulation. It modulates the vector norms drawn from FFT coefficients according to a guideline deduced from human auditory masking properties. The second module is referred to as boundary-constrained iterative adjustment. It provides a smooth transition across frames in the resulting speech waveform. Experiment results confirm the imperceptibility of the proposed modulation scheme in terms of the mean opinion score - listening quality objective (MOS-LQO) based on the perceptual evaluation of speech quality (PESQ) metric. The proposed watermarking method matched and exceeded the performance of five state-of-the-art methods in terms of robustness against common speech processing attacks.
Highlights
Rapid advances in computing and communication technology have made it easier than ever to access multimedia data via the internet
We developed an fast Fourier transform (FFT)-based speech watermarking method, which exploits auditory masking to improve the balance between robustness and imperceptibility
FFT-BASED SPEECH WATERMARKING We developed a two-phase speech watermarking method in the FFT domain, referred to as perceptual vector norm modulation (PVNM)
Summary
Rapid advances in computing and communication technology have made it easier than ever to access multimedia data via the internet. The most common approach to protecting online multimedia data is digital watermarking, which involves embedding confidential information pertaining to ownership within the media file. Hu et al.: Robust Blind Speech Watermarking via FFT-Based PVNM With Frame Self-Synchronization. The discrete wavelet transform (DWT), which captures both frequency and location information, has long been a standard approach in speech/audio watermarking [4], [13], [14]. Saadi et al [15] performed norm-space watermarking in a hybrid domain formed by the DWT and DCT in tandem Their method provides a good tradeoff between imperceptibility and robustness. We developed an FFT-based speech watermarking method, which exploits auditory masking to improve the balance between robustness and imperceptibility.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.