Abstract
Expert judgment elicitation is often required in probabilistic decision making and the evaluation of risk. One measure of the quality of probability distributions given by experts is calibration–the faithfulness of the probabilities in an empirically verifiable sense. A method of measuring calibration for continuous probability distributions is presented here. A discussion of the impact of using linear rules for combining such judgments is given and an empirical demonstration is given using data collected from experts participating in a large-scale risk study. It is shown by theoretical argument that combining well-calibrated distributions of individual experts using linear rules can only result in reducing calibration. In contrast, it is demonstrated, both by example and empirically, that an equally weighted linear combination of experts who tend to be “overconfident” can produce distributions that are better calibrated than the experts’ individual distributions. Using data from training exercises, it is shown that the improvement in calibration is rapid as the number of experts is increased from one to five or six, but there is only modest improvement from increasing the number of experts beyond that point.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have