BackgroundLarge health insurance claims databases can be used to estimate rates of rare safety outcomes. We measured incidence rates of rare outcomes that could be used to contextualize adverse events among people receiving pneumococcal vaccines in clinical trials or clinical practice. However, algorithms used to identify outcomes in administrative databases are subject to error. Using two algorithms for each outcome, we assessed the influence of algorithm choice on the rates of the outcomes. MethodsWe used closed administrative medical and pharmacy claims in the Healthcare Integrated Research DatabaseSM (HIRD) to construct a broad cohort of individuals less than 100 years old (i.e., the target cohort) and a trial-similar cohort of individuals resembling those potentially eligible for a vaccine clinical trial (e.g., for a pneumococcal vaccine). We stratified by age and sex and used specific and sensitive algorithms to estimate rates of 39 outcomes including cardiac/cerebrovascular, metabolic, allergic/autoimmune, neurological, and hematologic outcomes. Specific algorithms intended to reduce false positive errors, while sensitive algorithms intended to reduce false negative errors, thereby providing lower and upper bounds for the “true” rates. ResultsWe followed approximately 40 million individuals in the target cohort for an average of 3 years. Of 39 outcomes, 14 (36 %) had a rate from the specific algorithm that was less than half the rate from the sensitive algorithm. Rates of cardiac/cerebrovascular outcomes were most consistent (mean ratio of rates from specific algorithms compared to rates from sensitive algorithms = 0.76), while the rates of neurological and hematologic outcomes were the least consistent (mean ratio of rates = 0.33 and 0.36, respectively). ConclusionsFor many cardiac/cerebrovascular outcomes, rates were similar regardless of the algorithm. For other outcomes, rates varied substantially by algorithm. Using multiple algorithms to ascertain outcomes in claims data can be informative about the extent of uncertainty due to outcome misclassification.