Abstract

A considerable number of metrics can be used to evaluate the performance of machine learning algorithms. While much work is dedicated to the study and improvement of data quality and models’ performance, much less research is focused on the study of these evaluation metrics, their intrinsic relationship, the interplay of the influence among the metrics, the models, the data, and the environments and conditions in which they are to be applied. While some works have been conducted on general machine learning tasks such as classification, fewer efforts have been dedicated to more complex problems such as object detection and image segmentation, in which the evaluation of performance can vary drastically depending on the objectives and domains of application. Working in an agricultural context, specifically on the problem of the automatic detection of plants in proximal sensing images, we studied twelve evaluation metrics that we used to evaluate three image segmentation models recently presented in the literature. After a unified presentation of these metrics, we carried out an exploratory analysis of their relationships using a correlation analysis, a clustering of variables, and two factorial analyses (namely principal component analysis and multiple factorial analysis). We distinguished three groups of highly linked metrics and, through visual inspection of the representative images of each group, identified the aspects of segmentation that each group evaluates. The aim of this exploratory analysis was to provide some clues to practitioners for understanding and choosing the metrics that are most relevant to their agricultural task.

Highlights

  • The performance evaluation of machine learning (ML) models is, from an applicative perspective, perhaps the most crucial step in the predictive pipeline as it is often framed as a decisional step [1]

  • These two assumptions rely on an understanding of the theoretical and behavioural aspects of the evaluation metrics used, whose lack of clarity often increases with the complexity of the ML task

  • We presented and analysed 12 classification metrics in the context of plant image segmentation

Read more

Summary

Introduction

The performance evaluation of machine learning (ML) models is, from an applicative perspective, perhaps the most crucial step in the predictive pipeline as it is often framed as a decisional step [1]. Such a decision is often based on the “trust” attributed to the metrics computed and founded on two implicit assumptions: that the metrics are reflective of the true performance of the model on the test examples and that the metrics are reflecting the aspects of performance that are relevant to the application at hand These two assumptions rely on an understanding of the theoretical and behavioural aspects of the evaluation metrics used, whose lack of clarity often increases with the complexity of the ML task (for example, classical binary classification metrics such as precision and recall are much easier to understand than their multiclass counterparts). In complex tasks, the usage of simple metrics entails a simplification of the problem, leading to a loss of information about

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.