People diagnosed with Parkinson’s (PwP) exhibit a diverse manifestation of heterogeneous symptoms which likely reflect different subtypes. However, there is no widely accepted consensus on the criteria for subtype membership assignment. We explored clustering PwP using a data-driven approach mining speech signals. We used data from the three English-speaking cohorts (Boston, Oxford, Toronto) in the Parkinson’s Voice Initiative (PVI), where speech and basic demographic information were collected over the standard telephone network. We acoustically characterized 2097 sustained vowel /a/ recordings from 1138 PwP (Boston cohort) using 307 dysphonia measures. We applied unsupervised feature selection to select a concise subset of the dysphonia measures and hierarchical clustering combined with 2D-data projections using t-distributed stochastic neighbor embedding (t-SNE) to facilitate visual exploration of PwP groups. We assessed cluster validity and consistency using silhouette plots and the cophenetic correlation coefficient. We externally validated cluster findings on the Oxford and Toronto PVI cohorts (n = 285 and 107 participants, respectively). We selected 21 dysphonia measures and found four main clusters which provide tentative insights into different dominating speech-associated characteristics (cophenetic coefficient = 0.72, silhouette score = 0.67). The cluster findings were consistent across the three PVI cohorts, strongly supporting the generalization of the presented methodology towards PwP subtype assignment, and were independently visually verified in 2D projections with t-SNE. The presented methodology with mining sustained vowels and clustering may provide an objective and streamlined approach towards informing PwP subtype assignment. This may have important implications towards developing more personalized clinical management of symptoms for PwP.
Read full abstract