Warfarin is a common oral anticoagulant, and its effects vary widely among individuals. Numerous dose-prediction algorithms have been reported based on cross-sectional data generated via multiple linear regression or machine learning. This study aimed to construct an information fusion perturbation theory and machine learning prediction model of warfarin blood levels based on clinical longitudinal data from cardiac surgery patients. The data of 246 patients were obtained from electronic medical records. Continuous variables were processed by calculating the distance of the raw data with the moving average (MA ∆vki(sj)), and categorical variables in different attribute groups were processed using Euclidean distance (ED ǁ∆vk(sj)ǁ). Regression and classification analyses were performed on the raw data, MA ∆vki(sj), and ED ǁ∆vk(sj)ǁ. Different machine-learning algorithms were chosen for the STATISTICA and WEKA software. The random forest (RF) algorithm was the best for predicting continuous outputs using the raw data. The correlation coefficients of the RF algorithm were 0.978 and 0.595 for the training and validation sets, respectively, and the mean absolute errors were 0.135 and 0.362 for the training and validation sets, respectively. The proportion of ideal predictions of the RF algorithm was 59.0%. General discriminant analysis (GDA) was the best algorithm for predicting the categorical outputs using the MA ∆vki(sj) data. The GDA algorithm's total true positive rate (TPR) was 95.4% and 95.6% for the training and validation sets, respectively, with MA ∆vki(sj) data. An information fusion perturbation theory and machine learning model for predicting warfarin blood levels was established. A model based on the RF algorithm could be used to predict the target international normalized ratio (INR), and a model based on the GDA algorithm could be used to predict the probability of being within the target INR range under different clinical scenarios.
Read full abstract