Background and objectiveNeurodegenerative diseases are the most frequent age-related diseases. This type of disease, if not discovered in the initial stage, will compromise the quality of life of the affected subject. Thus, a timely diagnosis is of paramount importance. One of the most used tasks from neurologists to detect and determine the severity of the disease is analysing human gait. This work presents the dataset named “Beside Gait” containing timeseries of coordinates of extracted body joints of people with neurodegenerative diseases in various stages of the disease as well as control subjects. In addition, the novel Multi-Speed transformer technique will be presented and benchmarked against several other techniques making use of deep learning and Shallow Learning. The objective is to recognize subjects affected by some form of neurodegenerative disease in early stage using a computer vision technique making use of deep learning that can be integrated into a smartphone app for offline inference with the aim of promptly initiate investigations and treatment to improve the patient's quality of life. MethodsThe recorded videos were processed, and the skeleton of the person in the video was extracted using pose estimation. The raw time-series coordinates of the joints extracted by the pose estimation algorithm were tested against novel deep neural network architectures and Shallow Learning techniques. In this work, the proposed Multi-Speed Transformer is benchmarked against other deep neural networks such as Temporal Convolutional Neural Networks, Transformers, as well as Shallow Learning techniques making use of feature extraction and different classifiers such as Random Forests, K Nearest Neighbours, Ada Boost, Linear and RBF SVM. The proposed Multi-Speed Transformer architecture has been developed to learn short and long-term patterns to model the various pathological gaits. ResultsThe Multi-Speed Transformer outperformed all other existing models reaching an accuracy of 96.9%, a sensitivity of 96.9%, a precision of 97.7%, and a specificity of 97.1% in binary classification. The accuracy in multi-class classification for detecting the presence of the disease in various stages is 71.6%, the sensitivity is 67.7%, and the specificity is 71.8%. In addition, tests have also been conducted against two other different activity recognition datasets, namely SHREC and JHMDB, in the exact same conditions. Multi-Speed Transformer has demonstrated to beat always all other tested techniques as well as the techniques reviewed in the state-of-the-art with respectively of accuracy 91.8% and 74%. Having those datasets more than two classes, specificity was not computed. ConclusionsThe Multi-Speed Transformer is a valuable technique for neurodegenerative disease assessment through computer vision. In addition, the novel dataset “Beside Gait” here presented is an important starting point for future research work on automatic recognition of neurodegenerative diseases using gait analysis.