Abstract

Since convolutional neural network (CNN) can only extract local features, and long short-term memory (LSTM) neural network model has a large number of learning calculations, a long processing time and an obvious degree of information loss as the length of speech increases. Utilizing the characteristics of autonomous feature extraction in deep learning, CNN and bidirectional long short-term memory (BiLSTM) network are combined to present an encrypted speech retrieval method based on deep perceptual hashing and CNN-BiLSTM. Firstly, the proposed method extracts the Log-Mel Spectrogram/MFCC features of the original speech and enters the CNN and BiLSTM networks in turn for model training. Secondly, we use the trained fusion network model to learn the deep perceptual feature and generate deep perceptual hashing sequences. Finally, the normalized Hamming distance algorithm is used for matching retrieval. In order to protect the speech security in the cloud, a speech encryption algorithm based on a 4D hyperchaotic system is proposed. The experimental results show that the proposed method has good discrimination, robustness, recall and precision compared with the existing methods, and it has good retrieval efficiency and retrieval accuracy for longer speech. Meanwhile, the proposed speech encryption algorithm has a high key space to resist exhaustive attacks.

Highlights

  • With the increasing popularity of multimedia acquisition equipment and the rapid development of cloud storage, the Internet and other technologies

  • EXPERIMENTAL RESULTS AND PERFORMANCE ANALYSIS In the experiment, we use the speech from the THCHS-30 [34] as the experimental data, which is an open Chinese speech database published by the center for speech and language technology (CSLT) of Tsinghua University

  • In the stage of network model training, according to the definition of perceptual hashing, the multimedia digital representations with the same perceptual content is uniquely mapped into a digital digest

Read more

Summary

Introduction

With the increasing popularity of multimedia acquisition equipment and the rapid development of cloud storage, the Internet and other technologies. Multimedia data stored in the cloud saves local space for users, facilitates the data sharing between different clients, and brings difficulties in searching, privacy leakage and data insecurity [1], [2]. Due to the great changes in encrypted speech features and the continuous growth of speech data, it is difficult to retrieve encrypted speech. The research on encrypted speech retrieval technology has attracted the attention of many research institutions and scholars. The traditional encrypted speech retrieval methods are based on speech perceptual hashing technology to extract the perceptual features of speech [3]–[7].

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call