Vision-based multi-label detection framework for capturing occupant action and clothing information using large-scale dataset

Seunghoon Jung,Jaewon Jeoung,Taehoon Hong,Hyounseung Jang

doi:10.1016/j.buildenv.2024.111537

Abstract

Capturing occupant action and clothing information is important for applying occupant-centric control (OCC) to mitigate energy overuse and improve indoor environment quality. Therefore, this study introduces a vision-based multi-label detection framework for automatically capturing occupant actions and clothes. Ultimately, a single-stage architecture is designed, providing simultaneous detection of human body bounding boxes, action classes, and clothing classes. The framework also applies the training strategy, allowing concurrent training of action and clothing data. The experiment results showed that the proposed framework reliably detects occupant actions and clothes with a mean average precision (mAP) of 45.0 % and approximately doubles the inference speed compared to the multi-stage detection framework. This advancement paves the way for enhanced OCC systems by ensuring the diversity and variability of occupant information collection.

Full Text