Objectives Neuropsychological tests (NPTs) are standard tools for assessing cognitive function. These tools can evaluate the cognitive status of a subject, which can be time-consuming and expensive for interpretation. Therefore, this paper aimed to optimize the systematic NPTs by machine learning and develop new classification models for differentiating healthy controls (HC), mild cognitive impairment, and Alzheimer’s disease dementia (ADD) among groups of subjects. Patients and methods A total dataset of 14,926 subjects was obtained from the formal 46 NPTs based on the Seoul Neuropsychological Screening Battery (SNSB). The statistical values of the dataset included an age of 70.18 ± 7.13 with an education level of 8.18 ± 5.50 and a diagnosis group of three; HC, MCI, and ADD. The dataset was preprocessed and classified in two- and three-way machine-learning classification from scikit-learn (www.scikit-learn.org) to differentiate between HC versus MCI, HC versus ADD, HC versus Cognitive Impairment (CI) (MCI + ADD), and HC versus MCI versus ADD. We compared the performance of seven machine learning algorithms, including Naïve Bayes (NB), random forest (RF), decision tree (DT), k-nearest neighbors (KNN), support vector machine (SVM), AdaBoost, and linear discriminant analysis (LDA). The accuracy, sensitivity, specificity, positive predicted value (PPV), negative predictive value (NPV), area under the curve (AUC), confusion matrixes, and receiver operating characteristic (ROC) were obtained from each model based on the test dataset. Results The trained models based on 29 best-selected NPT features were evaluated, the model with the RF algorithm yielded the best accuracy, sensitivity, specificity, PPV, NPV, and AUC in all four models: HC versus MCI was 98%, 98%, 97%, 98%, 97%, and 99%; HC versus ADD was 98%, 99%, 96%, 97%, 98%, and 99%; HC versus CI was 97%, 99%, 92%, 97%, 97%, and 99% and HC versus MCI versus ADD was 97%, 96%, 98%, 97%, 98%, and 99%, respectively, in predicting of cognitive impairment among subjects. Conclusion According to the results, the RF algorithm was the best classification model for both two- and three-way classification among the seven algorithms trained on an imbalanced NPTs SNSB dataset. The trained models proved useful for diagnosing MCI and ADD in patients with normal NPTs. These models can optimize cognitive evaluation, enhance diagnostic accuracy, and reduce missed diagnoses.