Abstract

As one of the most complicated processes in biological development, ageing remains poorly understood. These days more and more ageing-related gene datasets become available on the web, where each instance is characterized by a set of hierarchically-organized binary features. Traditional data mining methods show limitations in exploiting this hierarchical feature space. This paper proposes a hybrid hierarchical feature selection (HHFS) method for classifying genes into pro-longevity or anti-longevity ones. HHFS conducts lazy and eager feature selections sequentially, taking into account both uniqueness of a test instance and the whole characteristics of datasets. It adopts two complementary relevancy metrics (i.e., Gini purity and mutual information) to remove hierarchical redundancy. The experiments are conducted based on the ageing-related gene data of four model organisms. The results show that HHFS achieves significantly better prediction performance than several state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call