Introduction: Automated sleep staging using deep learning models typically requires training on hundreds of sleep recordings, and pre-training on public databases is therefore common practice. However, suboptimal sleep stage performance may occur from mismatches between source and target datasets, such as differences in population characteristics (e.g., an unrepresented sleep disorder) or sensors (e.g., alternative channel locations for wearable EEG). Methods: We investigated three strategies for training an automated single-channel EEG sleep stager: pre-training (i.e., training on the original source dataset), training-from-scratch (i.e., training on the new target dataset), and fine-tuning (i.e., training on the original source dataset, fine-tuning on the new target dataset). As source dataset, we used the F3-M2 channel of healthy subjects (N = 94). Performance of the different training strategies was evaluated using Cohen's Kappa (κ) in eight smaller target datasets consisting of healthy subjects (N = 60), patients with obstructive sleep apnea (OSA, N = 60), insomnia (N = 60), and REM sleep behavioral disorder (RBD, N = 22), combined with two EEG channels, F3-M2 and F3-F4. Results: No differences in performance between the training strategies was observed in the age-matched F3-M2 datasets, with an average performance across strategies of κ = .83 in healthy, κ = .77 in insomnia, and κ = .74 in OSA subjects. However, in the RBD set, where data availability was limited, fine-tuning was the preferred method (κ = .67), with an average increase in κ of .15 to pre-training and training-from-scratch. In the presence of channel mismatches, targeted training is required, either through training-from-scratch or fine-tuning, increasing performance with κ = .17 on average. Discussion: We found that, when channel and/or population mismatches cause suboptimal sleep staging performance, a fine-tuning approach can yield similar to superior performance compared to building a model from scratch, while requiring a smaller sample size. In contrast to insomnia and OSA, RBD data contains characteristics, either inherent to the pathology or age-related, which apparently demand targeted training.
Read full abstract