Abstract

Supervised machine learning algorithms rarely cope directly with the temporal information inherent to longitudinal datasets, which have multiple measurements of the same feature across several time points and are often generated by large health studies. In this paper we report on experiments which adapt the feature-selection function of decision tree-based classifiers to consider the temporal information in longitudinal datasets, using a lexicographic optimisation approach. This approach gives higher priority to the usual objective of maximising the information gain ratio, and it favours the selection of features more recently measured as a lower priority objective. Hence, when selecting between features with equivalent information gain ratio, priority is given to more recent measurements of biomedical features in our datasets. To evaluate the proposed approach, we performed experiments with 20 longitudinal datasets created from a human ageing study. The results of these experiments show that, in addition to an improvement in predictive accuracy for random forests, the changed feature-selection function promotes models based on more recent information that is more directly related to the subject’s current biomedical situation and, thus, intuitively more interpretable and actionable.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.