AVID: A speech database for machine learning studies on vocal intensity

Paavo Alku,Manila Kodali,Laura Laaksonen,Sudarsana Reddy Kadiri

doi:10.1016/j.specom.2024.103039

Abstract

Vocal intensity, which is quantified typically with the sound pressure level (SPL), is a key feature of speech. To measure SPL from speech recordings, a standard calibration tone (with a reference SPL of 94 dB or 114 dB) needs to be recorded together with speech. However, most of the popular databases that are used in areas such as speech and speaker recognition have been recorded without calibration information by expressing speech on arbitrary amplitude scales. Therefore, information about vocal intensity of the recorded speech, including SPL, is lost. In the current study, we introduce a new open and calibrated speech/electroglottography (EGG) database named Aalto Vocal Intensity Database (AVID). AVID includes speech and EGG produced by 50 speakers (25 males, 25 females) who varied their vocal intensity in four categories (soft, normal, loud and very loud). Recordings were conducted using a constant mouth-to-microphone distance and by recording a calibration tone. The speech data was labelled sentence-wise using a total of 19 labels that support the utilisation of the data in machine learning (ML) -based studies of vocal intensity based on supervised learning. In order to demonstrate how the AVID data can be used to study vocal intensity, we investigated one multi-class classification task (classification of speech into soft, normal, loud and very loud intensity classes) and one regression task (prediction of SPL of speech). In both tasks, we deliberately warped the level of the input speech by normalising the signal to have its maximum amplitude equal to 1.0, that is, we simulated a scenario that is prevalent in current speech databases. The results show that using the spectrogram feature with the support vector machine classifier gave an accuracy of 82% in the multi-class classification of the vocal intensity category. In the prediction of SPL, using the spectrogram feature with the support vector regressor gave an mean absolute error of about 2 dB and a coefficient of determination of 92%. We welcome researchers interested in classification and regression problems to utilise AVID in the study of vocal intensity, and we hope that the current results could serve as baselines for future ML studies on the topic.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AVID: A speech database for machine learning studies on vocal intensity

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Jan 23, 2024
License type: cc-by

Similar Papers

Automatic Classification of Vocal Intensity Category from Speech
Manila Kodali ... Sudarsana Reddy Kadiri
-
Manila Kodali, et. al.Manila Kodali ... Sudarsana Reddy Kadiri
04 Jun 2023
04 Jun 2023

On the linearity of the relationship between the sound pressure level and the negative peak amplitude of the differentiated glottal flow in vowel production
Paavo Alku ... Erkki Vilkman
Speech Communication | VOL. 28
Paavo Alku, et. al.Paavo Alku ... Erkki Vilkman
19 Jul 1999
Speech Communication | VOL. 28

Vocal Intensity Characteristics inNormal and Elderly Speakers
F.Sean Hodge ... Richard T Kelley
Journal of Voice | VOL. 15
F.Sean Hodge, et. al.F.Sean Hodge ... Richard T Kelley
01 Dec 2001
Journal of Voice | VOL. 15

Changes in vocal loudness following intensive voice treatment (LSVT) in individuals with Parkinson's disease: a comparison with untreated patients and normal age-matched controls.
Lorraine O Ramig ... Cynthia Fox
Movement Disorders | VOL. 16
Lorraine O Ramig, et. al.Lorraine O Ramig ... Cynthia Fox
01 Jan 2001
Movement Disorders | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AVID: A speech database for machine learning studies on vocal intensity

Abstract

Talk to us

Similar Papers

More From: Speech Communication