Speech Personality Recognition Based on Annotation Classification Using Log-Likelihood Distance and Extraction of Essential Audio Features

Zhen-Tao Liu,Wei-Hua Cao,Abdul Rehman,Man Hao,Min Wu

doi:10.1109/tmm.2020.3025108

Abstract

Speech personality recognition relies on training models that require an excessive number of features and are, in most cases, designed specifically for certain databases. As a result, when tested on different datasets, overfitted classifier models are not always reliable because their accuracy changes with changes in the domain of speakers. Moreover, personality annotations are often subjective, which creates variability in raters perception during labeling. These problems inhibit the effectiveness of speech personality recognition applications. To reduce the unexplained variance caused by unknown differences in raters perception, a structure that uses Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm is proposed. Furthermore, a feature extraction method is proposed to filter out undesirable adulterations be it noise, silence, or uncertain pitch segments, while extracting essential audio features, i.e., signal power roll-off, pitch, and pause rate. Experiments on the standard SSPNet dataset records a relative 4% increase in overall accuracy when log-likelihood based annotations are used. Moreover, improved consistency in accuracy is observed when this method is tested on male and female subsets.

Full Text