The creation of Bayesian networks often requires the specification of a large number of parameters, making it highly desirable to be able to learn these parameters from historical data. In many cases, such data has uncertainty associated with it, including cases in which this data comes from unstructured analysis or from sensors. When creating diagnosis networks, for example, unstructured analysis algorithms can be run on the historical text descriptions or images of previous cases so as to extract data for learning Bayesian network parameters, but such derived data has inherent uncertainty associated with it due to the nature of such algorithms. Because of the inability of current Bayesian network parameter learning algorithms to incorporate such uncertainty, common approaches either ignore this uncertainty, thus reducing the resulting accuracy, or completely disregard such data. We present an approach for learning Bayesian network parameters that explicitly incorporates such uncertainty, and which is a natural extension of the Bayesian network formalism. We present a generalization of the Expectation Maximization parameter learning algorithm that enables it to handle any historical data with likelihood-evidence-based uncertainty, as well as an empirical validation demonstrating the improved accuracy and convergence enabled by our approach. We also prove that our extended algorithm maintains the convergence and correctness properties of the original EM algorithm, while explicitly incorporating data uncertainty in the learning process.
Read full abstract