A Long Sequence Speech Perceptual Hashing Authentication Algorithm Based on Constant Q Transform and Tensor Decomposition

Yibo Huang,Yong Wang,Manhong Fan,Hexiang Hou,Yuan Zhang

doi:10.1109/access.2020.2974029

Yibo Huang, Yong Wang + Show 3 more

Open Access

https://doi.org/10.1109/access.2020.2974029

Copy DOI

Abstract

Most speech authentication algorithms are over-optimized for robustness and efficiency, resulting in poor discrimination. Hashing shorter sequence is likely to cause the same hashing sequence to come from different speech segments, which will cause serious deviations in authentication. Few people pay attention to the research on the discrimination of hashing sequence length, so this paper proposes a long sequence speech authentication algorithm based on constant Q transform (CQT) and tensor decomposition (TD). In this paper, hashing long sequence is used to solve the problem of poor collision resistance of existing algorithms, fast and accurate authentication can be achieved for important speech fragments with large data volumes. The sub-band in the frequency domain are first divided into different matrix, then the variance set of sub-band in the frequency domain is obtained, and finally the feature values are obtained by CQT and TD transformation. The obtained feature values have strong robustness and can cope with the interference of complex channel environment. In this paper, Texas Instruments and Massachusetts Institute of Technology (TIMIT) speech database and the Text to Speech (TTS) are used to establish a database of 51600 speeches to verify the performance of the algorithm. Experimental results show that compared with the existing speech authentication algorithms, the proposed algorithm has the characteristics of high discrimination, strong robustness and high efficiency.

Highlights

With the development of multimedia technology, the speech has a huge amount of data, and has the characteristics of high redundancy and low confidentiality
I=1 where hs1 and hs2 respectively represent hashing long sequences for s1 and s2; N is the length of the hashing sequence
Where: M is the length of a frame of speech signal, N is the length of the hashing sequence, q is number of sub-band in frequency domain, r is number of sub-band variance sets, b is the parameter, K is the frequency band number

Summary

INTRODUCTION

With the development of multimedia technology, the speech has a huge amount of data, and has the characteristics of high redundancy and low confidentiality. Y. Huang et al.: Long Sequence Speech Perceptual Hashing Authentication Algorithm Based on CQT and TD wavelet transform (DWT) [10], [13], linear prediction coefficient (LPC) [14], spectrogram [22], [27], formant [24], bark frequency Cepstral coefficients [29] and multiple fusion features. Zhang et al [11] proposed an efficient perceptual hashing based on improved spectral entropy for speech authentication. Zhang et al [13] proposed a high-performance speech perceptual hashing authentication algorithm based on DWT and measurement matrix. The frequency band variance can reduce noise interference, and enhance the robustness of the algorithm.

CONSTANT Q TRANSFORM

TENSOR DECOMPOSITION

EXPERIMENTAL RESULTS AND ANALYSIS

DATASETS

DISCRIMINATION TEST AND ANALYSIS

ROBUSTNESS TEST AND ANALYSIS

PASSING RATE TEST AND ANALYSIS IN REAL NOISE ENVIRONMENT

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 26	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Long Sequence Speech Perceptual Hashing Authentication Algorithm Based on Constant Q Transform and Tensor Decomposition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Bridging the gap between the short-time Fourier transform (STFT), wavelets, the constant-Q transform and multi-resolution STFT
Carlos Mateo ... Juan Antonio Talavera
Signal, image and video processing | VOL. 14
Carlos Mateo, et. al.Carlos Mateo ... Juan Antonio Talavera
13 May 2020
Signal, image and video processing | VOL. 14

CBC-Based Synthetic Speech Detection
Jichen Yang ... Qianhua He
International Journal of Digital Crime and Forensics | VOL. 11
Jichen Yang, et. al.Jichen Yang ... Qianhua He
01 Apr 2019
International Journal of Digital Crime and Forensics | VOL. 11

Replay detection using CQT-based modified group delay feature and ResNeWt network in ASVspoof 2019
Xingliang Cheng ... Mingxing Xu
-
Xingliang Cheng, et. al.Xingliang Cheng ... Mingxing Xu
01 Nov 2019
Replay detection using CQT-based modified group delay feature and ResNeWt network in ASVspoof 2019
Xingliang Cheng ... Mingxing Xu

Long Sequence Biohashing Speech Authentication Based on Biometric Fusion and Modified Logistic Measurement Matrix
Yuan Zhang ... Yi-Bo Huang
-
Yuan Zhang, et. al.Yuan Zhang ... Yi-Bo Huang
01 Jun 2021
01 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Long Sequence Speech Perceptual Hashing Authentication Algorithm Based on Constant Q Transform and Tensor Decomposition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions