Abstract

Data augmentation techniques have been useful in dealing with limited data for machine learning tasks. Recently, spectrogram data augmentation techniques have been investigated for voice conversion and sound classification tasks and have produced better results. However, applying multiple data augmentation techniques within a mini-batch has been observed to lead to performance degradation. While applying multiple augmentation methods sequentially has shown performance gains in image data, transferring this approach to spectrogram data leads to loss of acoustic information. Hence, an alternative approach is needed to effectively utilize multiple augmentation methods in the speech domain. This study addressed these challenges in low-resource settings for spoken word recognition within the mini-batch. First, we investigated the effect of data augmentation techniques. Second, we investigated the effect of multiple data augmentation techniques. Finally, we proposed a new approach that uses an alternate mechanism to utilize multiple spectrogram augmentation techniques more effectively. The results of our experiment show that the proposed approach (new pattern) outperforms the sequential approach (traditional pattern) significantly at different scales of datasets, including low-resource settings. In addition, the proposed approach achieves approximately 2x actual speedup over the sequential approach. A combination of frequency-warping and time length control augmentation methods was found to be stable and robust in performance across all datasets evaluated.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.