Abstract

The complexity of a Bayesian network (BN) model and the number of training samples used have significant impacts on the prediction accuracy of the model regarding seismic liquefaction. The required training sample size for ensuring that a BN model has high generalization ability is a critical issue in parameter learning. To address this issue, this study analyses the relationship between the predictive performance of the BN model and the complexity of the model, training sample size, and average discrete intervals. Taking seismic liquefaction prediction as an example, 4536 statistical experiments are designed to investigate the training and testing performances of 21 different BN models under the conditions of nine different training sample size ratios (5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, and 80% of all data) and testing samples using 20% of all data. The results reveal that the learning performance of a BN is not sensitive to the training sample size but is related to the complexity of the model. The larger the sample size is, the stronger the generalization ability of the model. The minimum training sample requirements are related to the maximum in-degrees and the average discrete intervals, not the numbers of nodes and edges and the maximum out-degree of the BN structure. In addition, a modified structural entropy can characterize the complexity of a BN structure better than the existing structural entropy, but it has a worse relationship with the minimum training sample requirements than that of the maximum in-degree. To quickly determine the minimum training sample size requirements of a BN model with a predictive accuracy of 80%, a fitting function that considers the effects of the maximum in-degree and the average discrete intervals is presented, and its effectiveness is validated by two examples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call