Abstract Background: Cancer recurrence is a major event affecting the burden of the disease and is a critical decision point for patients and their providers. Population-based information on the risk of cancer recurrence is lacking because it is not routinely collected by cancer registries. Objective: To develop and implement a scalable, supervised learning algorithm to predict breast cancer recurrence status using information about disease at diagnosis from registry data and information about health care utilization from medical claims. Data: Medical claims from private insurers and Medicare (2011-2016) linked with the Puget Sound SEER Cancer Registry were made available via the Hutch Institute for Cancer Outcomes Research (HICOR). Gold-standard information on the recurrence of initially localized breast cancer was provided by investigators on the BRAVO study of breast cancer survivors diagnosed 2004-2016 in the Puget Sound area. The HICOR and BRAVO data were linked. The analysis dataset consisted of 111 patients with a recurrence or second breast cancer event and 689 patients without a recurrence or second breast cancer event who had adequate claims (insurance enrollment before and after their second event or for at least 12 consecutive months after primary treatment) available for analysis. Methods: A gradient-boosting algorithm (XGBoost) was harnessed to predict month-level recurrence status, i.e., whether any given month was before or after a recurrence event. Features included registry information on patient demographics, initial extent of disease, and hormone-receptor, and engineered features based on the counts of diagnosis, procedure and drug claims within groups determined by a blend of previously defined groups and groups customized for this application. Time-varying features included monthly counts of codes within each group, months since the most recent and subsequent occurrence of each code group, and cumulative sums of each code group. Subjects were split into a training (n=94) and test (n=17) set for reporting performance results. The training data were further split 5:1 for cross-validation purposes. Results: The list of most important variables included time since coding of secondary malignancy, cumulative sum of codes related to pathology, and codes related to catheter placement. The month-specific AUC on a validation subset (n=17 patients) was 0.89; individual-level (sensitivity, specificity) ranged from (0.824, 0.946) to (0.706,0.982). Conclusions: Data sources that link claims, cancer registry, and gold-standard disease status information are critical for the development of novel, automated approaches for detecting cancer recurrence. Gradient-boosted learning with engineered time-varying features shows promise for identifying recurrence events in administrative claims. Proper coding of procedure and drug groups is likely to be key to the performance of such algorithms. Incompleteness of claims data is a major challenge. Citation Format: Teresa A'mar, Daniel Markowitz, Jessica Chubak, David Beatty, Catherine Fedorenko, Christopher Li, Kathi Malone, Ruth Etzioni. Predicting recurrence or second breast cancer using linked claims and cancer registry data with limited gold-standard information: A gradient-boosting approach [abstract]. In: Proceedings of the AACR Special Conference on Modernizing Population Sciences in the Digital Age; 2019 Feb 19-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2020;29(9 Suppl):Abstract nr A09.