Evaluation metrics systematization for 2D human poses analysis models

Svitlana G Antoshchuk,Anastasiia A Breskina

doi:10.15276/hait.06.2023.2

Abstract

This paper describes the systematization of evaluation metrics for 2D human pose analysis models. Some of the most popular tasks solved using machine learning (ML) methods are detection, tracking and recognition of human actions for various practical applications. There are a lot of different metrics that allow evaluating the model from one point or another. To evaluate a specific task, a certain set of metrics is used. However, as literature analysis shows, the vast number of metric definitions, as well as the use of different terms and multiple representations of the same ideas, causes problems of interpretation and comparison of different ML models and methods in detecting, tracking, and recognizing human actions. The purpose of this work is to analyze the metrics for evaluating methods for processing 2D human poses in video in order to facilitate the informed choice of the metrics. To improve the objectivity of evaluating the results of empirical studies of existing and newly developed methods and models for detecting, tracking, and recognizing human actions, a systematization of existing metrics into subgroups was proposed, depending on what task they evaluate. Four classes of evaluation metrics were introduced: classification metrics, key point’s detection, object tracking, and general metrics. Classification metrics are based on quality evaluation and matching values from predicted bounding boxes with ground truths. Key point’s detection metrics are oriented on the quality of found joints of the human body skeleton. Tracking metrics evaluate the object detection on each frame and the correctness of determining its trajectory. General metrics are not specifically related to any of the human 2D pose analysis tasks. The prototype of the application based on suggested metrics systematization, the purpose of which is to help data scientists in formalizing the choice of metrics for evaluating models depending on the ML problem being solved and the application area was developed. To evaluate and demonstrate the metrics, that were suggested in this application, Faster R-CNN, SSD and YOLOv3 object detection models were analyzed and compared in scope of 2D human pose analysis application area. The results of the analysis showed that Faster R-CNN and YOLOv3 have the most accurate responses, although they have the disadvantage of a high False positive rate. The implementation also showed that metrics that based on True negative values are uninformative in scope of working with bounding boxes, because of the specific of application area and inability to calculate True negatives on the image data.

Full Text