AbstractBackgroundDementia, a clinical syndrome with progressive neurodegenerative disease, is one of the inevitable aging consequences. Alzheimer’s disease (AD) is the most common cause of dementia. Accurate detection and classification of Alzheimer’s disease alongside its conversion from Mild Cognitive Impairment (MCI) to AD at an early stage is of significant clinical importance. However, a reliable diagnosis with a multi‐label classification approach remains a challenging task. This study aimed to measure the effectiveness of machine learning algorithm integrated with feature importance techniques to constitutes a multi‐label classification of AD prediction for clinical practice.MethodThe machine learning algorithms are validated using a nationwide cohort dataset, the Korean Brain Aging Study for the early diagnosis and prediction of AD (KBASE) to identify the most important risk factors, where a database is classified into three classes namely, normal cognition (NC, n = 91), mild cognitive impairment (MCI, n = 61), and Alzheimer’s disease (AD, n = 49). Applying eight state‐of‐the‐art supervised machine learning algorithms namely Support Vector Machine, Naive Bayes, XGBoost, Decision Tree, Logistic Regression, Random Forest, Bagging, and AdaBoost. A stratified 10‐fold cross‐validation was used for every train/test split and several performance measures, including accuracy, precision, recall and f1‐score were used to compare the classifier performances.ResultBased on the performance evaluation, the XGBoost classifier significantly outperformed all other models and achieved 82.55% accuracy followed by the Bagging classifier (81%). According to the XGBoost model, calculating risk factors to predict performance, the Consortium to Establish a Registry for AD (CERAD) subsets is among the highest ranked feature. It was intriguing that the parameters including job, job levels and retirement achieved a minimal, if any, impact in the final performance.ConclusionCollectively, the XGBoost as a Gradient Boosted Tree model can be considered as the best‐supervised machine learning algorithm for multi‐class AD classification. The model also provides a visualization of the contribution of the risk factors. The data mining approach integrated with machine learning models could achieve favorable outcomes in the accurate risk prediction and classification of AD that can be applied to the clinical setting as an assisting diagnostic tool.