BackgroundDelayed multiple sclerosis (MS) diagnoses are not uncommon, an early diagnostic tool is urgently warranted. We aimed to develop an effective tool through electronic health records and machine learning techniques to early recognize MS patients from hospital visitors in China. MethodsTwo case sets were collected from January 2016 to December 2018. The training set had 239 MS and 1142 controls, and the test set had 23 MS and 92 controls. The utility of Extreme Gradient Boosting (XGBoost), Random Forest (RF), Naive Bayes, K-nearest-neighbor (KNN) and Support Vector Machine (SVM) in early diagnosis of MS was evaluated by the area under curve of receiver operating characteristic, precision, recall, specificity, accuracy and F1 score. ResultsThe XGBoost performed the best and was used to generate the results. Thirty-four variables which were highly relevant to MS diagnosis were set for the XGBoost model, and their relative importance with MS were ranked. The training set recall was 0.632, with a precision of 0.576, and the test set recall was 0.609, with a precision of 0.609. Our study found that 61%, 51%, and 49% of the patients could be diagnosed with MS, 1, 2, and 3 years earlier than their real diagnostic time point, respectively. ConclusionsA diagnostic tool for early MS recognition based on the XGBoost model and electronic health records were developed to help reduce diagnostic delays in MS.