Abstract

The application of machine learning techniques to sound signals requires the previous characterization of said signals. In many cases, their description is made using cepstral coefficients that represent the sound spectra. In this paper, the performance in obtaining cepstral coefficients by two integral transforms, Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT), are compared in the context of processing anuran calls. Due to the symmetry of sound spectra, it is shown that DCT clearly outperforms DFT, and decreases the error representing the spectrum by more than 30%. Additionally, it is demonstrated that DCT-based cepstral coefficients are less correlated than their DFT-based counterparts, which leads to a significant advantage for DCT-based cepstral coefficients if these features are later used in classification algorithms. Since the DCT superiority is based on the symmetry of sound spectra and not on any intrinsic advantage of the algorithm, the conclusions of this research can definitely be extrapolated to include any sound signal.

Highlights

  • Automatic processing of sound signals is a very active topic in many fields of science and engineering which find applications in multiple areas, such as speech recognition [1], speaker identification [2,3], emotion recognition [4], music classification [5], outlier detection [6], classification of animal species [7,8,9], detection of biomedical disease [10], and design of medical devices [11]

  • A recent survey of techniques employed in sound feature extraction can be found in [17], of which Spectrum-Temporal Parameters (STPs) [18], Linear Prediction Coding (LPC) coefficients [19], Linear Frequency Cepstral Coefficients (LFCC) [20], Pseudo Wigner-Ville Transform (PWVT) [21], and entropy coefficients [22] are of note

  • Repetition of f or, in contrast, a periodic repetition of f and its symmetric. This is a general question, we have addressed it in the context of a specific application. This is a general question, we have addressed it in the context of a specific application by featuring anuran calls for their further classification

Read more

Summary

Introduction

Automatic processing of sound signals is a very active topic in many fields of science and engineering which find applications in multiple areas, such as speech recognition [1], speaker identification [2,3], emotion recognition [4], music classification [5], outlier detection [6], classification of animal species [7,8,9], detection of biomedical disease [10], and design of medical devices [11]. The Mel-Frequency Cepstral Coefficients (MFCC) [23] are probably the most widely employed set of features in sound characterization and the majority of the sound processing applications mentioned above are based on their use. These features have been successfully employed in other fields, such as analysis of electrocardiogram (ECG) signals [24], gait analysis [25,26], and disturbance interpretation in power grids [27]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.