Abstract

Item Response Theory (IRT) is a tool developed in psychometrics to measure latent abilities of human respondents based on their responses to items with different levels of difficulty. Recently, IRT has been applied to evaluation in AI, by treating the algorithms as respondents and the AI tasks as items. Particularly in machine learning, IRT has been applied for evaluation of classifiers based on their predictions to each test instance. Based on a matrix of responses (classifiers vs instances), the IRT model estimates the latent difficulty and discrimination of each instance, as well as the ability of each classifier, in such a way that a classifier receives high ability value when it tends to correctly classify the most difficult instances. The IRT models previously adopted for evaluation in classification are not directly applied for regression, since they rely on dichotomous responses (i.e., a response has to be either correct or incorrect). In this paper we propose a new IRT model, particularly designed for dealing with nonnegative unbounded responses, which is adequate for modelling the absolute errors of regression algorithms. In the proposed model, responses follow a gamma distribution, parameterised according to respondents’ abilities and items’ difficulty and discrimination parameters. The proposed parameterisation results in item characteristic curves with more flexible shapes compared to the logistic curves widely adopted in IRT. The proposed model was evaluated with diverse regression algorithms and two benchmark datasets, one synthetic and one real. Useful insights were derived by inspecting regions in these datasets that present different levels of difficulty and discrimination.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.