Abstract Dysarthria is a neurological speech disorder that affects the speech intelligibility of the speaker. Speech assistive aids are developed to support their communication needs. Successful speech assistive aids are developed using automatic speech recognition systems trained using their own speech data. The effectiveness and usefulness of speech recognition systems depend on the amount of speech data used for training. However, collecting a large amount of dysarthric speech data is difficult. Data augmentation involves applying transformation techniques to increase the quantity of available speech data. Adding noise data is also one of the approaches to make such transformations and create a new volume of data. However, care should be taken while using noise data for the transformation of the dysarthric speech data since dysarthria on its own is disordered data, and adding even more distortion reduces its quality of it. However, by performing a proper analysis of the noisy data, noise can also be used as a source to create new samples of dysarthric speech data. This paper concentrates on identifying noise characteristics and finding the suitability of using noise as a source for data augmentation in dysarthric speech. With the noise-augmented dysarthric speech data, dysarthric speech recognition systems were trained to evaluate the quality of the augmented data. It was noted that for dysarthric speakers, especially with the severe category, the low-frequency noise selection approach has resulted in a lower WER than the without augmentation by 12.29%.
Read full abstract