Expert workers make non-trivial decisions with significant implications. Experts’ decision accuracy is, thus, a fundamental aspect of their judgment quality, key to both management and consumers of experts’ services. Yet, in many important settings, transparency in experts’ decision quality is rarely possible because ground truth data for evaluating the experts’ decisions is costly and available only for a limited set of decisions. Furthermore, different experts typically handle exclusive sets of decisions, and thus, prior solutions that rely on the aggregation of multiple experts’ decisions for the same instance are inapplicable. We first formulate the problem of estimating experts’ decision accuracy in this setting and then develop a machine–learning–based framework to address it. Our method effectively leverages both abundant historical data on workers’ past decisions and scarce decision instances with ground truth labels. Using both semi-synthetic data based on publicly available data sets and purposefully compiled data sets on real workers’ decisions, we conduct extensive empirical evaluations of our method’s performance relative to alternatives. The results show that our approach is superior to existing alternatives across diverse settings, including settings that involve different data domains, experts’ qualities, and amounts of ground truth data. To our knowledge, this paper is the first to posit and address the problem of estimating experts’ decision accuracies from historical data with scarce ground truth, and it is the first to offer comprehensive results for this problem setting, establishing the performances that can be achieved across settings as well as the state-of-the-art performance on which future work can build. This paper was accepted by Anindya Ghose, information systems. Funding: T. Geva acknowledges research grants from the Jeremy Coller Foundation and from the Henry Crown Institute for Business Research. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2021.03357 .
Read full abstract