The generalized linear model (GLM) is a popular modeling choice for pricing non-life insurance policies. However, high-cardinality categorical insurance data presents significant challenges for these GLM rate-making models. Additionally, insurance regulators often require rating territories, which are clusters of insurance policies’ geographic locations for setting insurance rates, to meet certain standards. For instance, (1) the credibility standard ensures that the number of policies in a territory is large enough to be credible and representative, (2) the contiguity standard requires the locations in each territory to be geographically adjacent to promote a logical and practical spatial grouping, and (3) the cardinality standard specifies an acceptable range for the number of territories in a geographic area. To address these challenges, this article proposes a nested GLM framework for non-life insurance rate-making applications. In this framework, neural network models with categorical embedding layers are constructed to model the residual deviance from simple GLMs, using high-cardinality categorical variables as input. Low-dimensionly features extracted from the neural network model effectively translate categorical variables into meaningful numerical representations, capturing their effects on the initial model’s residuals. The features corresponding to the location-related variable are further converted into a contiguous territory rating variable via spatially constrained clustering models. By incorporating outcomes from these models, the nested GLM not only satisfies regulatory requirements but also enhances the model’s predictive power, while maintaining the interpretability from the (generalized) linear form. The construction of a nested Poisson GLM is presented in this article. Its performance is demonstrated using a real-life Brazil auto insurance data to model claim frequency.
Read full abstract