Classifying longevity profiles through longitudinal data mining

Luis Enrique Zárate,Caio Eduardo Ribeiro

doi:10.1016/j.eswa.2018.09.035

Abstract

Populational studies of human ageing often generate longitudinal datasets with high dimensionality. In order to discover knowledge in such datasets, the traditional knowledge discovery in database task needs to be adapted. In this article, we present a full knowledge discovery process that was performed on a longitudinal dataset, mentioning the singularities of this process. We investigated the English Longitudinal Study of Ageing’s (ELSA’s) database, employing both semi-supervised and supervised learning techniques to determine and describe the profiles of individuals annotated with the class labels “short-lived” and “long-lived” who participated in the study. We report on the data preprocessing, the clustering task of finding the best sets of representatives of the profiles of each class, and the use of supervised learning to describe these profiles and perform a longitudinal classification on the dataset to investigate how consistently the unlabelled records would fit into the classes. The results show that several aspects are used to discriminate the individuals between the longevity profiles. Those aspects include economic, social and health-related attributes. The findings have pointed towards a need to further investigate the relationships between the different aspects, especially those related to physical health and wellbeing, and how they affect the lifespan of an individual. Furthermore, our methodology and the adopted procedures can be applied to any other data mining applications for longitudinal studies of ageing.

Full Text