Abstract Background Recent evidence suggests that the guideline-directed anticoagulant therapy for atrial fibrillation (AF) remains controversial. Widely-used CHA2DS2-VASc score is solely based on limited traditional cardiovascular risk factors, omitting AF characteristics and other markers of thromboembolic risk. A more efficient, safer and more personalized anticoagulant approach is warranted. Purpose To develop a data-driven deep reinforcement learning (DRL) model for guiding dynamic anticoagulant treatment in AF patients to improve cardiovascular outcomes. Methods Participants of this study were enrolled from the multicentred China Atrial Fibrillation (China-AF) Registry between August 2011 and December 2022, who were with regular follow-up every 6 months. We excluded patients on warfarin at baseline due to its declining usage trend in non-valvular AF patients in China. The DRL model was trained in 70% randomly selected patients for optimal dynamic decision-making, and then subsequently tested in the remaining 30%. Data of sociodemographic characteristics, AF characteristics, medical history, lifestyle factors, laboratory examination, and medications were input for model training. Concordance rate between DRL model’s recommendations and physicians’ actual decisions of non-vitamin-K-antagonist oral anticoagulant (NOAC) prescription among all visits before censoring was calculated for each patient. Primary outcome was the composite of cardiovascular death, ischemic stroke, transient ischemic attack or systemic embolism (SSE), and major bleeding. Shapley additive explanation analysis ranked the most important factors affecting decision-making of the DRL model. Results A total of 20068 patients (mean age: 63.0±12.0 years; 36.2% female) were randomly divided into a training cohort of 14050 patients and a testing cohort of 6018 patients. The model’s NOAC recommendations were mostly affected by age, prior NOAC prescription, body mass index, hypertension history and prior statin prescription (Figure 1). Patients with concordance rates of 50.1%-75% and 75.1%-100% had significant risk reductions for the primary outcome (adjusted HR =0.63; 95% CI, 0.46-0.85; P = 0.003 and adjusted HR =0.59; 95% CI, 0.46-0.75; P <0.001, respectively), compared to those with a concordance rate of 0-25%. Similar results were observed for all-cause death, cardiovascular death and SSE outcomes, that patients with the highest concordance rate had a significant lower risk, compared to those with the lowest concordance rate. There was a nonsignificant but similar trend with regard to major bleeding events (Figure 2). Conclusions This modelling study suggests that a data-driven DRL model might provide more efficient, safer and more personalized anticoagulant suggestions, potentially assisting physicians in clinical practice.