Abstract
Abstract Introduction Scoring algorithms have the potential to increase polysomnography (PSG) scoring efficiency while also ensuring consistency and reproducibility. We sought to validate an updated sleep staging algorithm (Somnolyzer; Philips, Monroeville PA USA) against manual sleep staging, by analyzing a dataset we have previously used to report sleep staging variability across nine center-members of the Sleep Apnea Global Interdisciplinary Consortium (SAGIC). Methods Fifteen PSGs collected at a single sleep clinic were scored independently by technologists at nine SAGIC centers located in six countries, and auto-scored with the algorithm. Each 30-second epoch was staged manually according to American Academy of Sleep Medicine criteria. We calculated the intraclass correlation coefficient (ICC) and performed a Bland-Altman analysis comparing the average manual- and auto-scored total sleep time (TST) and time in each sleep stage (N1, N2, N3, rapid eye movement [REM]). We hypothesized that the values from auto-scoring would show good agreement and reliability when compared to the average across manual scorers. Results The participants contributing to the original dataset had a mean (SD) age of 47 (12) years and 80% were male. Auto-scoring showed substantial (ICC=0.60-0.80) or almost perfect (ICC=0.80-1.00) reliability compared to manual-scoring average, with ICCs (95% confidence interval) of 0.976 (0.931, 0.992) for TST, 0.681 (0.291, 0.879) for time in N1, 0.685 (0.299, 0.881) for time in N2, 0.922 (0.791, 0.973) for time in N3, and 0.930 (0.811, 0.976) for time in REM. Similarly, Bland-Altman analyses showed good agreement between methods, with a mean difference (limits of agreement) of only 1.2 (-19.7, 22.0) minutes for TST, 13.0 (-18.2, 44.1) minutes for N1, -13.8 (-65.7, 38.1) minutes for N2, -0.33 (-26.1, 25.5) minutes for N3, and -1.2 (-25.9, 23.5) minutes for REM. Conclusion Results support high reliability and good agreement between the auto-scoring algorithm and average human scoring for measurements of sleep durations. Auto-scoring slightly overestimated N1 and underestimated N2, but results for TST, N3 and REM were nearly identical on average. Thus, the auto-scoring algorithm is acceptable for sleep staging when compared against human scorers. Support (if any) Philips.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.