Abstract Introduction Scoring algorithms have the potential to increase polysomnography (PSG) scoring efficiency while also ensuring consistency and reproducibility. We sought to validate an updated event detection algorithm (Somnolyzer; Philips, Monroeville PA USA) against manual scoring, by analyzing a dataset we have previously used to report scoring variability across nine center-members of the Sleep Apnea Global Interdisciplinary Consortium (SAGIC). Methods Fifteen PSGs collected at a single sleep clinic were scored independently by technologists at nine SAGIC centers located in six countries, and auto-scored with the algorithm. Arousals, apneas, and hypopneas were identified according to the American Academy of Sleep Medicine recommended criteria. We calculated the intraclass correlation coefficient (ICC) and performed a Bland-Altman analysis comparing the average manual- and auto-scored apnea-hypopnea index (AHI), arousal index (ArI), apneas, obstructive apneas, central apneas, mixed apneas, and hypopneas. We hypothesized that the values from auto-scoring would show good agreement and reliability when compared to the average across manual scorers. Results Participants contributing to the original dataset had a mean (SD) age of 47 (12) years, AHI of 24.7 (18.2) events/hour, and 80% were male. The ICCs (95% confidence interval) between average manual- and auto-scoring were almost perfect (ICC=0.80–1.00) for AHI [0.989 (0.968, 0.996)], ArI [0.897 (0.729, 0.964)], hypopneas [0.992 (0.978, 0.997)], total apneas [0.973 (0.924, 0.991)], and obstructive apneas [0.919 (0.781, 0.972)], and moderately reliable (ICC=0.40–0.60] for central [0.537 (0.069, 0.815)] and mixed [0.502 (0.021, 0.798)] apneas. Similarly, Bland-Altman analyses supported good agreement for event detection between techniques, with a mean difference (limits of agreement) of only 1.45 (-3.22, 6.12) events/hour for AHI, total apneas 5.2 (-23.9, 34.3), obstructive apneas 1.8 (-45.9, 49.5), central apneas 1.8 (-9.7, 13.4), mixed apneas 1.6 (-14.8, 17.9), and hypopneas 4.3 (-12.4, 20.9). Conclusion Results support almost perfect reliability between auto-scoring and manual scoring of AHI, ArI, hypopneas, total apneas, and obstructive apneas, as well as moderate reliability for central and mixed apneas. There was good agreement between methods, with small mean differences; wider limits of agreement for specific type of apneas did not affect accuracy of the overall AHI. Thus, the auto-scoring algorithm appears reliable for event detection. Support (if any) Philips