Obstructive sleep apnea diagnosis is based on the manual scoring of respiratory events. The agreement in the manual scoring of the respiratory events lacks an in-depth investigation as most of the previous studies reported only the apnea-hypopnea index or overall agreement, and not temporal, second-by-second or event subtype agreement. We hypothesized the temporal and subtype agreement to be low because the event duration or subtypes are not generally considered in current clinical practice. The data comprised 50 polysomnography recordings scored by 10 experts. The respiratory event agreement between the scorers was calculated using kappa statistics in a second-by-second manner. Obstructive sleep apnea severity categories (no obstructive sleep apnea/mild/moderate/severe) were compared between scorers. The Fleiss' kappa value for binary (event/no event) respiratory event scorings was 0.32. When calculated separately within N1, N2, N3 and R, the Fleiss' kappa values were 0.12, 0.23, 0.22 and 0.23, respectively. Binary analysis conducted separately for the event subtypes showed the highest Fleiss' kappa for hypopneas to be 0.26. In 34% of the participants, the obstructive sleep apnea severity category was the same regardless of the scorer, whereas in the rest of the participants the category changed depending on the scorer. Our findings indicate that the agreement of manual scoring of respiratory events depends on the event type and sleep stage. The manual scoring has discrepancies, and these differences affect the obstructive sleep apnea diagnosis. This is an alarming finding, as ultimately these differences in the scorings affect treatment decisions.
Read full abstract