Decision branch joint venture ex-fold T-z re-validation

Andrew Yatsko

doi:10.5430/air.v10n1p12

Abstract

Comparing classifier performances may seem a banal affair but makes a side show in machine learning. Usually the paired t-test is used. It requires that two classifiers were run simultaneously or this was simulated. This is not always possible and then entails creating a superstructure only for that purpose. However, the utility of t-test in the given context is altogether doubted. The literature on alternatives is much involved. This does not measure up to the scale of the issue. In this paper the topics in connection with accuracy calculation are surveyed once more, emphasizing the result variation. The known technique of multifold cross-validation is exemplified. A simplified methodology for comparison of classifier performances is proposed. It is based on the accuracy mean and variance and calculating differences between objects defined in these terms. It is being applied to the naive Bayesian and decision tree classifiers implemented on different platforms. The lazy learning approach, applicable to decision trees in discrete domains, is closely followed with an imposition of how it can be improved. Examples are given from the field of health diagnostics.

Full Text