We are generating truly mind-boggling amounts of audio data on a daily basis simply by using the Internet. In different audio-based applications, it increases the complexity of accessing and analyzing audio data. Therefore, the framework or supporting tools needed to retrieve audio data to make intelligent decisions in speech processing. However, non-stationarity and irregularity are insufficient for segmentation and classification of audio signals. Audio classification methods are used in many applications, such as speaker identification, gender recognition, music type classification, natural sound classification, etc. This work proposes a deep learning method based on long-term short-term memory (LSTM) that can be used with preprocessing, segmentation, and retrieval of audio signals from the GTZAN dataset. The simulation results show that the proposed algorithm can effectively improve the audio fingerprint-based data retrieval accuracy and overcome traditional methods' drawbacks. Compared with existing methods, the proposed LSTM method has achieved good results. The precision, recall, accuracy and F-measure of LSTM is 96.54%, 96.15%, 98.56% and 0.96% respectively. In the real world, the recommended audio fingerprint recognition system effectively works through voice applications, especially in heterogeneous portable consumer devices or online audio distributed systems.
Read full abstract