Effects of overconfidence and dependence on aggregated probability judgments

Shi‐Woei Lin,Ssu‐Wei Huang

doi:10.1108/17465661211208785

Abstract

PurposeThe purpose of this paper is to investigate how expert overconfidence and dependence affect the calibration of aggregated probability judgments obtained by various linear opinion‐pooling models.Design/methodology/approachThe authors used a large database containing real‐world expert judgments, and adopted the leave‐one‐out cross‐validation technique to test the calibration of aggregated judgments obtained by Cooke's classical model, the equal‐weight linear pooling method, and the best‐expert approach. Additionally, the significance of the effects using linear models was rigorously tested.FindingsSignificant differences were found between methods. Both linear‐pooling aggregation approaches significantly outperformed the best‐expert technique, indicating the need for inputs from multiple experts. The significant overconfidence effect suggests that linear pooling approaches do not effectively counteract the effect of expert overconfidence. Furthermore, the second‐order interaction between aggregation method and expert dependence shows that Cooke's classical model is more sensitive to expert dependence than equal weights, with high dependence generally leading to much poorer aggregated results; by contrast, the equal‐weight approach is more robust under different dependence levels.Research limitations/implicationsThe results suggest that methods involving broadening of subjective confidence intervals or distributions may occasionally be useful for mitigating the overconfidence problem. An equal‐weight approach might be more favorable when the level of dependence between experts is high. Although it was found that the number of experts and the number of seed questions also significantly affect the calibration of the aggregated distribution, further research to find the minimum number of questions or experts is required to ensure satisfactory aggregated performance would be desirable. Furthermore, other metrics or probability scoring rules should be used to check the robustness and generalizability of the authors' conclusion.Originality/valueThe paper provides empirical evidence of critical factors affecting the calibration of the aggregated intervals or distribution judgments obtained by linear opinion‐pooling methods.

Full Text