What can phone attractors in RPS tell us? A study of dynamic information in speech signals for phone classification purposes

Yasser Shekofteh

doi:10.1016/j.apacoust.2023.109534

Abstract

The speech production system is time-varying, multidimensional, and nonlinear. Most techniques for spoken feature extraction (SFE), which are tools for extracting information from speech signals, rely on the linear aspects of this system. In the past two decades, several techniques have been developed to account for the nonlinear characteristics of the system using embedded speech attractors in the reconstructed phase space (RPS). However, despite the clear benefits of speech representation in the RPS domain, only a few studies have successfully applied it for classification purposes. The main goal of this study is to develop an RPS-based framework that uses dynamic information of the embedded speech attractors in the RPS domain and outperforms the time-domain SFE techniques. The extracted features are based on multivariate linear prediction models of phone trajectories that show the dynamic information of the embedded speech attractor in the RPS. Several experiments on the FARSDAT and TIMIT databases test the phone classification accuracy of the proposed framework and show that the dynamic information of the phone attractors can significantly improve phone classification accuracy.

Full Text