Abstract Introduction Despite an appreciable rise in sleep wellness and sleep medicine A.I. research publications, public data corpuses, institutional support, and health A.I. research funding opportunities, the availability of controlled-retrospective, hybrid-retrospective-prospective, and prospective-RCT quality clinical validation study evidence is limited with respect to their potential clinical impact. Furthermore, only a few practical examples of A.I. technologies are validated, in use today clinically, and widely adopted, to assist in sleep diagnoses and treatment. In this study, we contribute to this growing body of clinical A.I. validation evidence and experimental design methodologies with an interoperable A.I. scoring engine in Adult and Pediatric populations. Methods Stratified random sampling with proportionate allocation was applied to a database of N>10,000 retrospective diagnostic clinical polysomnography (PSG), selected by evidence grading standards, with controls applied for OSA severity, diagnoses; sleep, psychiatric, neurologic, neurodevelopmental, cardiac, pulmonary, metabolic disorders, medications; benzodiazepines, antidepressants, stimulants, opiates, sleep aids, demographic groups of interest; sex, adult age, pediatric age, BMI, weight, height, and patient-reported sleepiness, to establish representative N=100 Adult and N=100 Pediatric samples. Double Blinded scoring was prospectively collected for each sample by 3 experienced RPSGT certified sleep technologists randomized from a pool of 9 scorers. Sensitivity (PA), Specificity (NA), Accuracy (OA), Kappa (K), and 95% Bootstrap CI’s are presented for sleep stages, OSA/CSA, hypopnea 3%/4%, arousals, limb movements, Cheyenne-Stokes respiration, periodic breathing, atrial fibrillation, and other events, and normative, mild, moderate, and severe OSA categories for global-AHI and REM-AHI. Results for Sleep Staging and OSA Severity Diagnostic Accuracy are summarized. Results A.I. scoring performance meet but in most cases exceeded initial clinical validation study (N=72 Adults, 2017) PA, NA, OA, K point-estimates and confidence-interval results for the 26 event types and 8 AHI-categories evaluated. The Adult sample showed 87%/94% Sensitivity/Specificity across all stages (Wake/N1/N2/N3/REM) and 94%/96% Sensitivity/Specificity for AHI>=15. The Pediatric sample showed 87%/93% Sensitivity/Specificity staging, 89%/98% Sensitivity/Specificity AHI>=15. Observed Accuracy was >90% for Adults and Pediatrics all 26 events and 7 AHI-categories analyzed, except REM-AHI>=5 (85%/82% Adults/Pediatrics). Conclusion We provide clinical validation evidence that demonstrates interoperable A.I. scoring performance in representative Adult and Pediatric patient clinical PSG samples when compared to prospective, double-blind scoring panel. Support (if any):
Read full abstract