571 Background: Adverse event (AE) assessment is an important component of clinical trials and routine clinical care. Natural language processing (NLP) approaches may facilitate automated methods to identify AEs, potentially allowing both early intervention and streamlined trial AE documentation. We previously demonstrated that an NLP pipeline had high accuracy for identifying toxicities associated with radiotherapy from clinical notes compared to physician reviewers. This study aims to evaluate this pipeline in comparison with clinical trial reporting and physician reviewers. Methods: We identified a total of 53 encounters across 20 patients enrolled over four immunotherapy trials with both corresponding notes and associated AEs. Two independent physician reviewers (DK & LN) identified symptoms documented in notes using a National Cancer Institute Common Terminology Criteria for Adverse Events (CTCAE) v5.0 reporting tool. Likewise, a previously described NLP pipeline based on Apache clinical Text Analysis Knowledge Extraction System was used on the notes to identify CTCAE symptoms. NLP-extracted symptoms and physician reviewer identification were assessed for sensitivity and specificity compared to trial-reported AEs. Cohen kappa score was used to assess inter-rater reliability between physician reviewers. Results: The five most reported trial AEs were diarrhea (17%), fever (17%), fatigue (15%), constipation (13.2%), and nausea (11.3%) (Table).Except for anorexia, NLP had greater or equal sensitivity for reported AEs compared to both reviewers (R1, R2). However, both reviewers had greater specificity than NLP. Between the reviewers, vomiting and anorexia had the highest Cohen kappa scores (1, 0.88), while fever and constipation had the lowest (0.31, 0.47, Table). Conclusions: NLP can detect toxicity symptoms from clinical notes with higher or equal sensitivity than physician reviewers, while physician reviewers had greater specificity. These results indicate that our NLP approach over-identifies symptoms while manual reviewers are more conservative in their approach. Therefore, NLP may best be served in the screening setting. NLP may also help reduce reviewer inter-rater variability, particularly in fever, constipation, and fatigue symptoms.[Table: see text]