Abstract
Purpose Create and share a MATLAB library that performs data augmentation algorithms for audio data. This study aims to help machine learning researchers to improve their models using the algorithms proposed by the authors. Design/methodology/approach The authors structured our library into methods to augment raw audio data and spectrograms. In the paper, the authors describe the structure of the library and give a brief explanation of how every function works. The authors then perform experiments to show that the library is effective. Findings The authors prove that the library is efficient using a competitive dataset. The authors try multiple data augmentation approaches proposed by them and show that they improve the performance. Originality/value A MATLAB library specifically designed for data augmentation was not available before. The authors are the first to provide an efficient and parallel implementation of a large number of algorithms.
Highlights
Deep neural networks achieved state of the art performances in many artificial intelligence fields, such as image classification [1], object detection [2] and audio classification [3]
X 1⁄4 fx1;1; . . . xn1;1; x1;2; . . . xn2;2; . . . ; x1;M ; . . . xnM ;Mg, where xi;j represents a generic audio sample i from the class j, we propose to augment xi;j with techniques working on raw audio signals and to augment the spectrogram Sðxi;jÞ produced by the same raw audio signals
We used the function sgram included in the large time-frequency analysis toolbox (LTFAT) [21] to convert raw audios into spectrograms
Summary
Deep neural networks achieved state of the art performances in many artificial intelligence fields, such as image classification [1], object detection [2] and audio classification [3] They usually need a very large amount of labeled data to obtain good results and these data might not be available due to high labeling costs or due to the scarcity of the samples. Data augmentation is a powerful tool to improve the performance of neural networks It consists in modifying the original samples to create new ones, without changing their labels [4]. In case of limited memory availability, one CNN can be trained with the H AugSA spectrograms, another with the K AugSS spectrograms and the scores can be combined by a fusion rule
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.