To develop models for prediction of the onset of specific diseases in cats using pet insurance data and to evaluate their predictive performance. Agria Pet Insurance data from almost 550,000 cats (2011 to 2016) were analyzed and used to train predictive models for periodontal disease and skin tumors using breed, sex, and insurance claim history. Random downsampling and 1:1 matching by age, insurance duration, and time at risk balanced the dataset. Variables were then further processed, with random forest and conditional logistic regression used for analysis. Model accuracy was assessed through leave-one-out cross-validation, while variable importance plots, partial dependence plots, and coefficients were used for model interpretation. Model accuracy ranged from 81.9% to 88.2% (P < .01, baseline 50%). Key predictors included prior insurance claims for "digestive," "whole body symptom," "skin," and "injury conditions," which may be nonspecific and predictive of various diseases. Maine Coon, Siamese, and Burmese cats were associated with periodontal disease-positive predictions, while domestic cats were linked with negative predictions. For skin tumors, Norwegian Forest Cats, Devon Rex and Sphynx cats, and Maine Coon cats were associated with positive predictions, whereas Birman and domestic cats were linked with negative predictions. This study presents a method of machine learning predictive analysis on pet insurance data, although more comprehensive medical information and approaches accounting for data characteristics may be necessary to develop clearer predictors. To prevent or detect these conditions early, veterinarians can use the breed risk results to guide clients, especially those with high-risk breeds, by offering early advice on lifestyle and monitoring.
Read full abstract