Abstract

The ability to predict the risk of water shortage is critical, and therefore it is important to develop methods of parameter estimation for statistical models in situations when insufficient data are available. Based on the maximum entropy principle, this paper proposes an alternative method of parameter estimation for a logistic regression model in the case of small sample numbers. The new method requires very little data about risk factors, whereas the maximum likelihood estimation requires a high quantity of data regarding risk and risk factors. In addition, the paper applies a new formula for normalized information flow (information flow is a physical notion logically associated with causality, which can be used to quantify the cause–effect relation between dynamic events) to select important risk factors. Five experiments are performed based on predictions of water shortage risk in the Beijing–Tianjin–Tangshan region to validate the performance of the new method with different small sample sizes. The results show that the new method is generally reliable and performs much better than the maximum likelihood estimation when only small samples are used. Specifically, an improvement of between 87.9 and 95.3% is observed when the number of samples is more than 15 and less than 30. The new method still generates an acceptable result using only 10 samples, while the maximum likelihood estimation is unreliable in such situations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call