Abstract

Machine learning for data mining applications in the field of bioinformatics is to extract new knowledge to provide an improved and effective diagnosis process for patients. In this paper, we introduce an adaptive ensemble learning for classifying high-dimensional multi-class imbalanced genomic data. The aspect is to design and develop an optimal ensemble method for information discovery on genomic data, which improve the prediction accuracy of DNA variant classification. The proposed method is based on ensemble of decision trees, data pre-processing, feature selection and grouping. It converts an imbalanced genomic data into multiple balanced ones and then builds a number of decision trees on these multiple data with specific feature groups. The outputs of these trees are combined for classifying new instances by majority voting technique. In this empirical study, different ensemble predictive modelling techniques like Random Forest, Boosting and Bagging were compared with the proposed ensemble method. The experimental results on genomic data (148 Exome datasets) of Brugada syndrome from the Centre of Medical Genetics, VUB UZ Brussel show that the proposed method is usually superior to the conventional ensemble learning algorithms when classifying the high-dimensional multi-class imbalanced genomic data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.