Objective:This paper aims to introduce and assess KeyGAN, a generative modeling-based keystroke data synthesizer. The synthesizer is designed to generate realistic synthetic keystroke data capturing the nuances of fine motor control and cognitive processes that govern finger-keyboard kinematics, thereby paving the way to support biomarker development for psychomotor impairment due to neurodegeneration. Methods:KeyGAN is designed with two primary objectives: (i) to ensure high realism in the synthetic distributions of the keystroke features and (ii) to analyze its ability to replicate the subtleties of natural typing for enhancing biomarker development. The quality of synthetic keystroke data produced by KeyGAN is evaluated against two keystroke-based applications, TypeNet and nQiMechPD, employed as’referee’ controls. The performance of KeyGAN is compared with a reference random Gaussian generator, testing its ability to fool the biometric authentication method TypeNet, and its ability to characterize fine motor impairment in Parkinson’s Disease using nQiMechPD. Results:KeyGAN outperformed the reference comparator in fooling the biometric authentication method TypeNet. It also exhibited a superior approximation to real data than the reference comparator when using nQiMechPD, showcasing its adaptability and versatility in mimicking early signs of Parkinson’s Disease in natural typing. KeyGAN’s synthetic data demonstrated that almost 20% of real PD samples could be replaced in the training set without a decline in classification performance on the real test set. Low Fréchet Distance (<0.03) and Kullback–Leibler Divergence (<700) between KeyGAN outputs and real data distributions underline the high performance of KeyGAN. Conclusion:KeyGAN presents strong potential as a realistic keystroke data synthesizer, displaying impressive capability to reproduce complex typing patterns relevant to biomarkers for neurological disorders, like Parkinson’s Disease. The ability of its synthetic data to effectively supplement real data for training algorithms without affecting performance implies significant promise for advancing research in digital biomarkers for neurodegenerative and psychomotor disorders.
Read full abstract