Abstract

This paper outlines a package synchronization scheme for blind speech watermarking in the discrete wavelet transform (DWT) domain. Following two-level DWT decomposition, watermark bits and synchronization codes are embedded within selected frames in the second-level approximation and detail subbands, respectively. The embedded synchronization code is used for frame alignment and as a location indicator. Tagging voice active frames with sufficient intensity makes it possible to avoid ineffective watermarking during the silence segments commonly associated with speech utterances. We introduce a novel method referred to as adaptive mean modulation (AMM) to perform binary embedding of packaged information. The quantization steps used in mean modulation are recursively derived from previous DWT coefficients. The proposed formulation allows for the direct assignment of embedding strength. Experiment results show that the proposed DWT-AMM is able to preserve speech quality at a level comparable to that of two other DWT-based methods, which also operate at a payload capacity of 200 bits per second. DWT-AMM exhibits superior robustness in terms of bit error rates, as long as the recovery of adaptive quantization steps is secured.

Highlights

  • In the digital era, copyright protection of multimedia data is an important issue for content owners and service providers

  • We introduce a robust blind watermarking scheme for hiding two types of information within embeddable regions of discrete wavelet transform (DWT) subbands designated as information packages

  • Information bits and synchronization codes are embedded within the second-level approximation and detail subbands, respectively

Read more

Summary

Introduction

Copyright protection of multimedia data (e.g., images, audios, and videos) is an important issue for content owners and service providers. The energy of a speech signal is normally concentrated below 4 kHz. To make the best use of DWT decomposition, we selected the second-level approximation subband for the embedding of binary information and reserved the second-level detail subband for frame synchronization on condition that the speech is sampled at 16 kHz. After taking two-level DWT of the host signal, the coefficients in the second-level approximation and detail subbands are both partitioned into non-overlapping frames of size Lf. In this study, Lf is tentatively set to 160 to facilitate subsequent scheme development.

A Resampling
F Amplitude scaling
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call