Speech Detection Method Research Articles

In this paper, we propose a preprocessing strategy for denoising of speech data based on speech segment detection. A design of computationally efficient speech denoising is necessary to develop a scalable method for large-scale data sets. Furthermore, it becomes more important as the deep learning-based methods have been developed because they require significant costs while showing high performance in general. The basic idea of the proposed method is using the speech segment detection so as to exclude non-speech segments before denoising. The speech segmentation detection can exclude non-speech segments with a negligible cost, which will be removed in denoising process with a much higher cost, while maintaining the accuracy of denoising. First, we devise a framework to choose the best preprocessing method for denoising based on the speech segment detection for a target environment. For this, we speculate the environments for denoising using different levels of signal-to-noise ratio (SNR) and multiple evaluation metrics. The framework finds the best speech segment detection method tailored to a target environment according to the performance evaluation of speech segment detection methods. Next, we investigate the accuracy of the speech segment detection methods extensively. We conduct the performance evaluation of five speech segment detection methods with different levels of SNRs and evaluation metrics. Especially, we show that we can adjust the accuracy between the precision and recall of each method by controlling a parameter. Finally, we incorporate the best speech segment detection method for a target environment into a denoising process. Through extensive experiments, we show that the accuracy of the proposed scheme is comparable to or even better than that of Wavenet-based denoising, which is one of recent advanced denoising methods based on deep neural networks, in terms of multiple evaluation metrics of denoising, i.e., SNR, STOI, and PESQ, while it can reduce the denoising time of the Wavenet-based denoising by approximately 40–50% according to the used speech segment detection method.

Read full abstract

BackgroundCleft palate patients have inability to produce adequate velopharyngeal closure, which results in hypernasal speech. In clinic, hypernasal speech is assessed through subject assessment by speech language pathologists. Automatic hypernasal speech detection can provide aided diagnoses for speech language pathologists and clinicians. ObjectivesThis study aims to develop Long Short-Term Memory (LSTM) based Deep Recurrent Neural Network (DRNN) system to detect hypernasal speech from cleft palate patients, thus to provide aided diagnoses for clinical operation and speech therapy. Meanwhile, the feature mining and classification abilities of LSTM-DRNN system are explored. MethodsThe utilized speech recordings are 14,544 vowels in Mandarin. Speech data is collected from 144 children (72 children with hypernasality and 72 controls) with the age of 5–12 years old. This work proposes a LSTM based DRNN system to achieve automatic hypernasal speech detection, since LSTM-DRNN can learn short-time dependences of hypernasal speech. The vocal tract based features are fed into LSTM-DRNN to achieve deep mining of features. To verify the feature mining ability of LSTM-DRNN, features projected by LSTM-DRNN are fed into shallow classifiers instead of the following two fully connected layers and a softmax layer. And the features without the projecting process of LSTM-DRNN are directly fed into shallow classifiers as a comparison. Hypernasality-sensitive vowels (/a/, /i/, and /u/) are analyzed for the first time. ResultsThis LSTM-DRNN based hypernasal speech detection method reaches higher detection accuracy than that using shallow classifiers, since LSTM-DRNN mines features through time axis and network depth simultaneously. The proposed LSTM-DRNN based hypernasality detection system reaches the highest accuracy of 93.35%. According to the analysis of hypernasality-sensitive vowels, the experimental result concludes that vowels /i/ and /u/ are the most sensitive vowels to hypernasal speech. ConclusionsThe results show that LSTM-DRNN has robust feature mining ability and classification ability. This is the first work that applies the LSTM-DRNN technique to automatically detect hypernasality in cleft palate speech. The experimental results demonstrate the potential of deep learning on pathologist speech detection.

Read full abstract

Speech Detection Method Research Articles

Related Topics

Articles published on Speech Detection Method

AFP-Conformer: Asymptotic feature pyramid conformer for spoofing speech detection

Multi-class hate speech detection in the Norwegian language using FAST-RNN and multilingual fine-tuned transformers

An explainable deepfake of speech detection method with spectrograms and waveforms

Speech endpoint detection method based on logarithmic energy entropy product of adaptive sub-bands in low signal-to-noise ratio environments

Speech endpoint detection method based on logarithmic energy entropy product of adaptive sub-bands in low signal-to-noise ratio environments

The Impact of Data Pre-Processing on Hate Speech Detection in a Mix of English and Hindi–English (Code-Mixed) Tweets

Subband fusion of complex spectrogram for fake speech detection

Voice Deepfake Detection Using the Self-Supervised Pre-Training Model HuBERT

Speech Fatigue Detection Based on Deep Learning

Token replacement-based data augmentation methods for hate speech detection

User Authentication Method via Speaker Recognition and Speech Synthesis Detection

Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method.

Speech Signal Detection Based on Bayesian Estimation by Observing Air-Conducted Speech under Existence of Surrounding Noise with the Aid of Bone-Conducted Speech

Hate speech detection in Twitter using hybrid embeddings and improved cuckoo search-based neural networks

A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

Voice Pathology Detection and Classification Using Convolutional Neural Network Model

Bidirectional Long Short Term Memory Method and Word2vec Extraction Approach for Hate Speech Detection

Hate Speech Detection in Twitter using Transformer Methods

Discriminative feature based on FWMW for playback speech detection

HypernasalityNet: Deep recurrent neural network for automatic hypernasality detection

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speech Detection Method Research Articles

Related Topics

Articles published on Speech Detection Method

AFP-Conformer: Asymptotic feature pyramid conformer for spoofing speech detection

Multi-class hate speech detection in the Norwegian language using FAST-RNN and multilingual fine-tuned transformers

An explainable deepfake of speech detection method with spectrograms and waveforms

Speech endpoint detection method based on logarithmic energy entropy product of adaptive sub-bands in low signal-to-noise ratio environments

Speech endpoint detection method based on logarithmic energy entropy product of adaptive sub-bands in low signal-to-noise ratio environments

The Impact of Data Pre-Processing on Hate Speech Detection in a Mix of English and Hindi–English (Code-Mixed) Tweets

Subband fusion of complex spectrogram for fake speech detection

Voice Deepfake Detection Using the Self-Supervised Pre-Training Model HuBERT

Speech Fatigue Detection Based on Deep Learning

Token replacement-based data augmentation methods for hate speech detection

User Authentication Method via Speaker Recognition and Speech Synthesis Detection

Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method.

Speech Signal Detection Based on Bayesian Estimation by Observing Air-Conducted Speech under Existence of Surrounding Noise with the Aid of Bone-Conducted Speech

Hate speech detection in Twitter using hybrid embeddings and improved cuckoo search-based neural networks

A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

Voice Pathology Detection and Classification Using Convolutional Neural Network Model

Bidirectional Long Short Term Memory Method and Word2vec Extraction Approach for Hate Speech Detection

Hate Speech Detection in Twitter using Transformer Methods

Discriminative feature based on FWMW for playback speech detection

HypernasalityNet: Deep recurrent neural network for automatic hypernasality detection