Incorporating statistical clustering methods into mortality models to improve forecasting performances

Cary Chi-Liang Tsai,Echo Sihan Cheng

doi:10.1016/j.insmatheco.2021.03.005

Abstract

Statistical clustering is a procedure of classifying a set of objects such that objects in the same class (called cluster) are more homogeneous, with respect to some features or characteristics of objects, to each other than to those in any other classes. In this paper, we apply four clustering approaches to improving forecasting performances of the Lee–Carter and CBD models. First, each of four clustering methods (Ward’s hierarchical clustering, divisive hierarchical clustering, K-means clustering, and Gaussian mixture model clustering) is adopted to determine, based on some characteristics of mortality rates, the number and partition of age clusters from the whole study ages 25-84. Next, we forecast 10-year and 20-year mortality rates for each of the age clusters using the Lee–Carter and CBD models, respectively. Finally, numerical illustrations are given with two R packages “NbClust” and “mclust” for clustering. Mortality data for both genders of the US and the UK are obtained from the Human Mortality Database, and the MAPE (mean absolute percentage error) measure is adopted to evaluate forecasting performance. Comparisons of MAPE values are made with and without clustering, which demonstrate that all the proposed clustering methods can improve forecasting performances of the Lee–Carter and CBD models.

Full Text