Abstract Background/Introduction Peripheral Arterial Disease (PAD) is a common pathology, affecting 4.5% of the UK population. Symptomatic PAD manifests as intermittent claudication (IC). Predictors of IC progression to Chronic-Limb-Threatening Ischaemia (CLTI) and complications including Major Adverse Limb Events (MALE), such as smoking history and diabetes, have been described previously in resource-intensive cohort studies. Novel Natural Language Processing (NLP) approaches transform routinely collected, unstructured health records into datasets for real-world analysis of PAD. Purpose This study leverages NLP based interrogation of electronic health records to efficiently identify risk factors for disease progression and MALE in patients presenting with IC. Methods A retrospective cohort study of patients with PAD at a large, tertiary vascular referral centre identified using the SNOMED terms of IC in the Medcat-NLP-AI toolkit. Demographics, Index of Multiple Deprivation (IMD), and indicators of disease progression, and revascularisation and amputation (MALE) were analysed using Kaplan-Meier survival and Cox Proportional-Hazards analysis. Results 5,027 patients (Mean age 73.7(61.8-85.6), Males 66.1% (n=2,781) were identified. Self-reported population ethnicity was 4.72% Asian, 15.5% Black, 79.7% White. All-cause mortality was 19.5% (n=979) after 10 years of follow-up, with 5.85% of patients progressing to CLTI (n=294). 2.32% of Asian patients, 5.78% of Black patients and 5.04% of White patients progressed to CLTI, respectively. 5.09% of Asian patients, 8.99% of Black patients, and 7.36% of White patients underwent amputation. Multivariate analysis of risk factors for progression to CLTI, demonstrated that diabetes (HR 2.36, 95%CI: 1.22 - 4.48, p = .01) and smoking history (HR 1.76, 95%CI: 1.19 - 2.59, p = .004) independently predicted progression. No risk factors for MALE reached significance. Conclusions Diabetes and smoking behaviour are well-described in the literature as risk factors for progression to CLTI. This study demonstrates the utility of NLP as a resource efficient, validated, methodological tool. Disparities in MALE may exist between patients within a universal healthcare system, although our sample size was insufficient to demonstrate statistical significance. A larger, multicentre analysis is warranted. NLP represents a valuable tool for efficiently identifying and analysing risk factors for PAD progression and holds potential to refine management algorithms.KM curves of CLTI-free survival (DM)