Abstract

A method for embedding data into speech signals without recourse to bandwidth expansion is proposed. Sampled speech is assembled into contiguous blocks of N samples and the Discrete Fourier Transform (DFT) is performed on each block. All the phase components in the message band, or the last J components in this band, are discarded when unvoiced or voiced speech is present, respectively. The data is introduced in the place of these rejected phase components, being +π/2 for a logical 0 and −π/2 for a logical 1. The magnitude of the coefficients associated with the data-carrying phase components are scaled to guard against data errors resulting from channel noise. The inverse DFT yields the transmitted sequence. The receiver performs the inverse process, stripping off the data and replacing it with random phase values. For an average transmission rate of approximately 1 kb/s and a channel signal-to-noise ratio of 30 dB, the bit error rate was 5.5 × 10−4, and the average signal-to-noise ratios for voiced and unvoiced speech were 24 and −3 dB, respectively. However, the unvoiced sounds were perceived with negligible distortion owing to the preservation of their magnitude spectra. Modest error-correction codes can be used to reduce the bit error rate to 10−4 while maintaining the same recovered speech quality, provided the average transmitted bit rate is decreased to ≃500 b/s.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.