A safe and pragmatic guide for labeling and delabeling patients with suspected penicillin allergy is mandatory. To compare the performance of four penicillin allergy prediction strategies in a large independent cohort. We conducted a retrospective study for subjects presenting between 01/2014 and 12/2021 at the University Hospital of Montpellier, with a history of hypersensitivity to penicillins. The outcome targeted by the study was a positive penicillin allergy test. Of the 1884 participants included, 382 (20.3%) had positive penicillin allergy tests. The ENDA (European Network on Drug Allergy) and Blumenthal strategies yielded relatively high sensitivities and low specificities, and, by design, did not misclassify any positive subjects with severe index reactions. The PEN-FAST<3 score had a negative predictive value of 90% (95%CI, 88%-91%), with a sensitivity of 66% (95%CI, 62%-71%) and a specificity of 73% (95%CI, 71%-75%), and incorrectly delabeled 18 subjects with anaphylaxis and 15 with other severe non-immediate reactions. For the adapted Chiriac-score, the specificity corresponding to 66% sensitivity was 73% (95%CI, 70%-75%). Conversely, at a 73% specificity threshold, the sensitivity was 65% (95%CI, 61%-70%). Attempts to improve these prediction algorithms did not substantially enhance performance. The ENDA and Blumenthal strategies are safe for high-risk subjects, but their delabeling effectiveness is limited, leading to unnecessary avoidance. Conversely, the PEN-FAST and Chiriac scores are performant in delabeling, but more frequently misclassify high-risk subjects with positive penicillin allergy tests. Selection of the most appropriate tool requires careful consideration of the target population and the desired goal.