Supervised pattern classification relies on a labeled training set to learn decision boundaries that separate samples from different classes. Such samples can be either weakly- or reliably-labeled; in the first case, one can employ techniques specifically designed to cope with uncertainty during labeling, and in the other scenario, it relies on numerous alternatives, including metric learning. Pattern classifiers usually adopt the Euclidean distance to compare samples and assess their proximity, but this implies the feature space is embedded in a plane. However, samples are embedded in curved spaces for some applications, although not straightforward to prove. In this manuscript, we assessed the performance of the Optimum-Path Forest (OPF) classifier under different distance functions, which are used to weigh arcs among samples, for a graph encoding the feature space. This work compared 47 distance measures applied to the OPF classifier considering 22 datasets, plus Decision Trees, Logistic Regression, and Support Vector Machines. The experiments highlighted that OPF is user-friendly when handling distance measures and can obtain better accuracies in some situations than its standard (Euclidean) counterpart and the classifiers mentioned above. On the other hand, time-consuming distance calculations may affect OPF’s efficiency during inference.
Read full abstract