Abstract

With the growing complexity of various text-to-speech systems, it is becoming more important to understand the underlying perceptual and judgement processes that drive user Quality-of-Experience (QoE) perception. Typical QoE assessment techniques, such as listening tests with self-report ratings, are useful but provide limited insight into these underlying processes. Recent advances in neuroimaging and physiological monitoring technologies, however, have opened new doors and allowed us to better understand and measure QoE perception. In this paper, we explore the use of two neuroimaging techniques, namely electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), to better understand neuronal and cerebral haemodynamic changes resultant from synthesized speech of varying quality. Neural correlates of several QoE dimensions were derived and validated on the publicly available PhySyQX database. Fusion of EEG, fNIRS, and fNIRS-derived physiological parameters, combined with conventional features extracted from the synthesized speech signal showed to accurately represent several QoE dimensions, including those related to listener affective states. It is hoped that these findings will help researchers build better instrumental QoE models that incorporate technological, contextual, and human influence factors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call