Summary Objectives : Camera-based vital sign estimation allows the contactless assessment of important physiological parameters. Seminal contributions were made in the 1930s, 1980s, and 2000s, and the speed of development seems ever increasing. In this suivey, we aim to overview the most recent works in this area, describe their common features as well as shortcomings, and highlight interesting “outliers”. Methods : We performed a comprehensive literature research and quantitative analysis of papers published between 2016 and 2018. Quantitative information about the number of subjects, studies with healthy volunteers vs. pathological conditions, public datasets, laboratory vs. real-world works, types of camera, usage of machine learning, and spectral properties of data was extracted. Moreover, a qualitative analysis of illumination used and recent advantages in terms of algorithmic developments was also performed. Results : Since 2016, 116 papers were published on camera-based vital sign estimation and 59% of papers presented results on 20 or fewer subjects. While the average number of participants increased from 15.7 in 2016 to 22.9 in 2018, the vast majority of papers (n=100) were on healthy subjects. Four public datasets were used in 10 publications. We found 27 papers whose application scenario could be considered a real-world use case, such as monitoring during exercise or driving. These include 16 papers that dealt with non-healthy subjects. The majority of papers (n=61) presented results based on visual, red-green-blue (RGB) information, followed by RGB combined with other parts of the electromagnetic spectrum (n=18), and thermography only (n=12), while other works (n=25) used other mono- or polychromatic non-RGB data. Surprisingly, a minority of publications (n=39) made use of consumer-grade equipment. Lighting conditions were primarily uncontrolled or ambient. While some works focused on specialized aspects such as the removal of vital sign information from video streams to protect privacy or the influence of video compression, most algorithmic developments were related to three areas: region of interest selection, tracking, or extraction of a one-dimensional signal. Seven papers used deep learning techniques, 17 papers used other machine learning approaches, and 92 made no explicit use of machine learning. Conclusion : Although some general trends and frequent shortcomings are obvious, the spectrum of publications related to camera-based vital sign estimation is broad. While many creative solutions and unique approaches exist, the lack of standardization hinders comparability of these techniques and of their performance. We believe that sharing algorithms and/ or datasets will alleviate this and would allow the application of newer techniques such as deep learning.
Read full abstract