Abstract

The problem of underfitting and overfitting in machine learning is often associated with a bias-variance trade-off. The underfitting most clearly manifests in the tree-based inducers when used to classify the gene expression data. To improve the generalization ability of decision trees, we are introducing an evolutionary, multi-test tree approach tailored to this specific application domain. The general idea is to apply gene clusters of varying size, which consist of functionally related genes in each splitting rule. It is achieved by using a few simple tests that mimic each other’s predictions and built-in information about the discriminatory power of genes. The tendencies to underfit and overfit are limited by the multi-objective fitness function that minimizes tree error, split divergence and attribute costs. Evolutionary search for multi-tests in internal nodes, as well as the overall tree structure, is performed simultaneously.This novel approach called Evolutionary Multi-Test Tree (EMTTree) may bring far-reaching benefits to the domain of molecular biology including biomarker discovery, finding new gene-gene interactions and high-quality prediction. Extensive experiments carried out on 35 publicly available gene expression datasets show that we managed to significantly improve the accuracy and stability of decision tree. Importantly, EMTTree does not substantially increase the overall complexity of the tree, so that the patterns in the predictive structures are kept comprehensible.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.