Population-based registries are increasingly used for comparative effectiveness research (CER) in prostate cancer (PCa). These types of observational studies (Obs) require fewer resources than randomized control trials (RCTs) but have been shown to have systematic errors including missing key co-variables, a lack of early oncologic endpoints, and numerous forms of unmeasured confounding factors that can greatly bias outcomes. RCTs eliminate many of these limitations and are the gold standard for CER. We aimed to compare survival outcomes between Obs and RCTs, and to explore factors that might improve agreement in order to support the role of Obs CER. A systematic search was performed to identify Obs published between 1/2000 and 12/2018 comparing two PCa treatment regimens using Surveillance, Epidemiology and End Results (SEER), SEER-Medicare, or the National Cancer Database. RCTs comparing the same treatments for a similar population of patients were identified. Correlation between survival hazard ratio (HR) estimates was assessed using a concordance correlation coefficient (CCC). Agreement between the conclusions of Obs and RCTs was assessed using the kappa statistic. Multivariable analyses (MVA) were performed to evaluate predictors of agreement after multiple testing correction. Of 433 Obs reviewed, 81 comparisons met eligibility criteria. There was no correlation between HR estimates from Obs and RCTs (CCC=0.0096). Only 21% of Obs and RCT conclusions agreed (kappa -0.31 (95% CI -0.17– -0.022)), and only 45% of the Obs HRs fell within the 95% CIs of the matched RCTs. No study design factors, including database source, reporting quality, variable adjustments, statistical methods (e.g. propensity weighting, instrumental variable, or sensitivity analyses), or degree of matching eligibility criteria improved agreement between outcomes. On MVA, the only predictor of improved agreement was the Obs’ conclusion showing no significant difference between arms (OR 23.4 (5.8–138.6), p<0.0001). Additionally, Obs evaluating surgery and other non-radiation (RT) modalities had significantly worse agreement with RCTs (OR 0.06 (0.01-0.38), p=0.003). No particular RT comparison (i.e. dose-escalation, use of brachytherapy, or addition of systemic therapy) improved agreement (p=0.34). Agreement between Obs and RCTs comparing PCa treatments is worse than what would be expected by chance alone, particularly for studies investigating surgical and systemic therapies. No modifiable study design factors were identified that improved agreement with RCTs. Thus, given the poor correlation between Obs and RCT conclusions, population-based registry studies should not be viewed as hypothesis-generating and indeed have a worse-than-random chance of validating RCT results.