Abstract

Most speech authentication algorithms are over-optimized for robustness and efficiency, resulting in poor discrimination. Hashing shorter sequence is likely to cause the same hashing sequence to come from different speech segments, which will cause serious deviations in authentication. Few people pay attention to the research on the discrimination of hashing sequence length, so this paper proposes a long sequence speech authentication algorithm based on constant Q transform (CQT) and tensor decomposition (TD). In this paper, hashing long sequence is used to solve the problem of poor collision resistance of existing algorithms, fast and accurate authentication can be achieved for important speech fragments with large data volumes. The sub-band in the frequency domain are first divided into different matrix, then the variance set of sub-band in the frequency domain is obtained, and finally the feature values are obtained by CQT and TD transformation. The obtained feature values have strong robustness and can cope with the interference of complex channel environment. In this paper, Texas Instruments and Massachusetts Institute of Technology (TIMIT) speech database and the Text to Speech (TTS) are used to establish a database of 51600 speeches to verify the performance of the algorithm. Experimental results show that compared with the existing speech authentication algorithms, the proposed algorithm has the characteristics of high discrimination, strong robustness and high efficiency.

Highlights

  • With the development of multimedia technology, the speech has a huge amount of data, and has the characteristics of high redundancy and low confidentiality

  • I=1 where hs1 and hs2 respectively represent hashing long sequences for s1 and s2; N is the length of the hashing sequence

  • Where: M is the length of a frame of speech signal, N is the length of the hashing sequence, q is number of sub-band in frequency domain, r is number of sub-band variance sets, b is the parameter, K is the frequency band number

Read more

Summary

INTRODUCTION

With the development of multimedia technology, the speech has a huge amount of data, and has the characteristics of high redundancy and low confidentiality. Y. Huang et al.: Long Sequence Speech Perceptual Hashing Authentication Algorithm Based on CQT and TD wavelet transform (DWT) [10], [13], linear prediction coefficient (LPC) [14], spectrogram [22], [27], formant [24], bark frequency Cepstral coefficients [29] and multiple fusion features. Zhang et al [11] proposed an efficient perceptual hashing based on improved spectral entropy for speech authentication. Zhang et al [13] proposed a high-performance speech perceptual hashing authentication algorithm based on DWT and measurement matrix. The frequency band variance can reduce noise interference, and enhance the robustness of the algorithm.

CONSTANT Q TRANSFORM
TENSOR DECOMPOSITION
EXPERIMENTAL RESULTS AND ANALYSIS
DATASETS
DISCRIMINATION TEST AND ANALYSIS
ROBUSTNESS TEST AND ANALYSIS
PASSING RATE TEST AND ANALYSIS IN REAL NOISE ENVIRONMENT
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call