0348 Sleep Staging Using End-to-End Deep Learning Model Based on Nocturnal Sound for Smartphones

Joonki Hong,Jung Kyung Hong,In-Young Yoon,Hyeryung Jang,Jinhwan Jeong,Jeong-Whun Kim,Hai Tran

doi:10.1093/sleep/zsac079.345

Abstract

Abstract Introduction Convenient sleep tracking with mobile devices such as smartphones is desirable for people who want to easily objectify their sleep. The objective of this study was to introduce a deep learning model for sound-based sleep staging using audio data recorded with smartphones during sleep. Methods Two different audio datasets were used. One (N = 1,154) was extracted from polysomnography (PSG) data and the other (N = 327) was recorded using a smartphone during PSG from independent subjects. The performance of sound-based sleep staging would always depend on the quality of the audio. In practical conditions (non-contact and smartphone microphones), breathing and body movement sounds during night are so weak that the energy of such signals is sometimes smaller than that of ambient noise. The audio was converted into Mel spectrogram to detect latent temporal frequency patterns of breathing and body movement sound from ambient noise. The proposed neural network model consisted of two sub-models. The first sub-model extracted features from each 30-second epoch Mel spectrogram and the second one classified sleep stages through inter-epoch analysis of extracted features. Results Our model achieved 70 % epoch-by-epoch agreement for 4-class (wake, light, deep, rapid eye movement) stage classification and robust performance across various signal-to-noise conditions. More precisely, the model was correct in 77% of wake, 73% of light, 46% of deep, and 66% of REM. The model performance was not considerably affected by existence of sleep apnea but degradation observed with severe periodic limb movement. External validation with smartphone dataset also showed 68 % epoch-by-epoch agreement. Compared with some commercially available sleep trackers such as Fitbit Alta HR (0.6325 in mean per-class sensitivity) and SleepScore Max (0.565 in mean per-class sensitivity), our model showed superior performance in both PSG audio (0.655 in mean per-class sensitivity) and smartphone audio (0.6525 in mean per-class sensitivity). Conclusion To the best of our knowledge, this is the first end (Mel spectrogram-based feature extraction)-to-end (sleep staging) deep learning model that can work with audio data in practical conditions. Our proposed deep learning model of sound-based sleep staging has potential to be integrated in smartphone application for reliable at-home sleep tracking. Support (If Any)

Full Text