Atrial Fibrillation (AF) screening from face videos has become popular with the trend of telemedicine and telehealth in recent years. In this study, the largest facial image database for camera-based AF detection is proposed. There are 657 participants from two clinical sites and each of them is recorded for about 10 minutes of video data, which can be further processed as over 10 000 segments around 30 seconds, where the duration setting is referred to the guideline of AF diagnosis. It is also worth noting that, 2 979 segments are segment-wise labeled, that is, every rhythm is independently labeled with AF or not. Besides, all labels are confirmed by the cardiologist manually. Various environments, talking, facial expressions, and head movements are involved in data collection, which meets the situations in practical usage. Specific to camera-based AF screening, a novel CNN-based architecture equipped with an attention mechanism is proposed. It is capable of fusing heartbeat consistency, heart rate variability derived from remote photoplethysmography, and motion features simultaneously to reliable outputs. With the proposed model, the performance of intra-database evaluation comes up to 96.62% of sensitivity, 90.61% of specificity, and 0.96 of AUC. Furthermore, to check the capability of adaptation of the proposed method thoroughly, the cross-database evaluation is also conducted, and the performance also reaches about 90% on average with the AUCs being over 0.94 in both clinical sites.
Read full abstract