Abstract

Speech personality recognition relies on training models that require an excessive number of features and are, in most cases, designed specifically for certain databases. As a result, when tested on different datasets, overfitted classifier models are not always reliable because their accuracy changes with changes in the domain of speakers. Moreover, personality annotations are often subjective, which creates variability in raters perception during labeling. These problems inhibit the effectiveness of speech personality recognition applications. To reduce the unexplained variance caused by unknown differences in raters perception, a structure that uses Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm is proposed. Furthermore, a feature extraction method is proposed to filter out undesirable adulterations be it noise, silence, or uncertain pitch segments, while extracting essential audio features, i.e., signal power roll-off, pitch, and pause rate. Experiments on the standard SSPNet dataset records a relative 4% increase in overall accuracy when log-likelihood based annotations are used. Moreover, improved consistency in accuracy is observed when this method is tested on male and female subsets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call