Machine Learning with High-Cardinality Categorical Features in Actuarial Applications

Benjamin Avanzi,Bernard Wong,Greg Taylor,Melantha Wang

doi:10.1017/asb.2024.7

Abstract

AbstractHigh-cardinality categorical features are pervasive in actuarial data (e.g., occupation in commercial property insurance). Standard categorical encoding methods like one-hot encoding are inadequate in these settings.In this work, we present a novel Generalised Linear Mixed Model Neural Network (“GLMMNet”) approach to the modelling of high-cardinality categorical features. The GLMMNet integrates a generalised linear mixed model in a deep learning framework, offering the predictive power of neural networks and the transparency of random effects estimates, the latter of which cannot be obtained from the entity embedding models. Further, its flexibility to deal with any distribution in the exponential dispersion (ED) family makes it widely applicable to many actuarial contexts and beyond. In order to facilitate the application of GLMMNet to large datasets, we use variational inference to estimate its parameters—both traditional mean field and versions utilising textual information underlying the high-cardinality categorical features.We illustrate and compare the GLMMNet against existing approaches in a range of simulation experiments as well as in a real-life insurance case study. A notable feature for both our simulation experiment and the real-life case study is a comparatively low signal-to-noise ratio, which is a feature common in actuarial applications. We find that the GLMMNet often outperforms or at least performs comparably with an entity-embedded neural network in these settings, while providing the additional benefit of transparency, which is particularly valuable in practical applications.Importantly, while our model was motivated by actuarial applications, it can have wider applicability. The GLMMNet would suit any applications that involve high-cardinality categorical variables and where the response cannot be sufficiently modelled by a Gaussian distribution, especially where the inherent noisiness of the data is relatively high.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ASTIN Bulletin	Publication Date: Apr 11, 2024
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Machine Learning with High-Cardinality Categorical Features in Actuarial Applications

Abstract

Talk to us

Similar Papers

More From: ASTIN Bulletin

Lead the way for us

Similar Papers

A novel model and solution algorithm to improve crew scheduling in railway transportation: A real world case study
Paweł Hanczar ... Arash Zandi
Computers & Industrial Engineering | VOL. 154
Paweł Hanczar, et. al.Paweł Hanczar ... Arash Zandi
21 Jan 2021
Computers & Industrial Engineering | VOL. 154

Issues and challenges of teaching and learning in 3D virtual worlds: real life case studies
Ulrike Pfeil ... Panayiotis Zaphiris
Educational Media International | VOL. 46
Ulrike Pfeil, et. al.Ulrike Pfeil ... Panayiotis Zaphiris
01 Sep 2009
Educational Media International | VOL. 46

A State-of-the-Art Review in Big Data Management Engineering: Real-Life Case Studies, Challenges, and Future Research Directions
Leonidas Theodorakopoulos ... Alexandra Theodoropoulou
Eng | VOL. 5
Leonidas Theodorakopoulos, et. al.Leonidas Theodorakopoulos ... Alexandra Theodoropoulou
03 Jul 2024
Eng | VOL. 5

On the applicability of the moving line source theory to thermal response test under groundwater flow: considerations from real case studies
Adriana Angelotti ... Andrea Zille
Geothermal Energy | VOL. 6
Adriana Angelotti, et. al.Adriana Angelotti ... Andrea Zille
11 Jul 2018
Geothermal Energy | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine Learning with High-Cardinality Categorical Features in Actuarial Applications

Abstract

Talk to us

Similar Papers

More From: ASTIN Bulletin