Abstract Background: Treatment decisions for metastatic breast cancer (mBC) patients are increasingly complex and this patient subset is poorly characterized. In Sweden, despite the availability of national health registers, it is currently difficult to identify mBC patients’ true prevalence and characteristics at the national level due to lack of variables or missing information on recurrence from early to late disease. Aim: Develop an algorithm trained to identify mBC patients in Swedish national health registers to estimate the number of mBC patients and to describe their characteristics and survival outcomes. Methods: This study was a retrospective database study performed on Swedish national data (National Patient Register, Prescribed Drug Register, Cancer Register and the Cause of Death Register) linked with metastatic status, outcome and biomarker data from a regional BC register (Uppsala University Hospital) via unique personal identification number. The regional BC register data, containing medical records of known mBC and non-mBC patients between 2009-2016, were divided into a training set (n=2,680) and a test set (n=670). Based on known mBC patients’ unique features, derived from linked national data, we developed a support-vector machine (SVM) trained to identify mBC patients (further detailed in the poster). The model’s performance to classify mBC patients was measured in the test set; accuracy: 97.3%, sensitivity: 90.0%, specificity: 98.2%, balanced accuracy: 94.1%. The SVM algorithm was utilized to predict prevalent mBC cases 2009-2016 nationally. Kaplan-Meier estimates were used to model overall survival (OS) as defined from date of first metastatic diagnosis (ICD10 C78 and/or C79). Cox proportional hazard models were used to test the association between hormone receptor status (HR+/HR-) and de novo/recurrent patients. Patients were assumed to be HR+ if they had ≥2 endocrine prescriptions and de novo in case of DFS ≤3 months. Results: Between 2009 and 2016 we found a total of 150,235 patients alive with a BC diagnosis. Within this population the SVM algorithm identified a subset of 13,826 (9.2%) mBC patients corresponding to an incidence of 1,318 per year (13.7 per 100,000) and a prevalence of 5,171 per year (53.8 per 100,000). Median age at mBC diagnosis was 67.5 years and median survival was estimated at 29.8 months. 18.3% of the total mBC population were de novo patients. Median age at diagnosis was 68.7 years for de novo mBC and 67.2 years for recurrent mBC, and survival was estimated at 30.1 months and 29.7 months, respectively, with a slightly better prognosis for de novo patients (HR: 0.92; p-value<0.01) after adjusting for age and HR status. HR+ expression showed statistically significant association with OS (HR: 0.50; p-value<0.001). Median survival was calculated for the mBC population by age group. Patients <50 years, 50-70 years and >70 years at mBC diagnosis had a median survival of 43.3, 37.2, 20.1 months, respectively. Conclusion: Previous studies have used machine learning for cancer detection or prognostication. Here we show that machine learning algorithms can be applied to identify patient subsets in national population health registers. With this study design we have been able to describe the epidemiology and survival of the full national Swedish mBC population; to our knowledge this is the first study to do so. Citation Format: Henrik Lindman, Mate Szilcz, Jonatan Freilich, Peter Carlqvist, Simona Vertuani, Barbro Holm. Machine learning to identify and characterize metastatic breast cancer patients in Sweden: A population-based study. CLEE011ASE01 [abstract]. In: Proceedings of the 2019 San Antonio Breast Cancer Symposium; 2019 Dec 10-14; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2020;80(4 Suppl):Abstract nr P2-08-03.