Abstract

Machine learning and data mining techniques have recently gained more popularity in the field of Medical diagnosis, especially for the analysis of the human genome. One of the most significant sources of human genome variation is Single Nucleotide Polymorphisms (SNPs), which have been associated with multiple human diseases. Several techniques have been developed for distinguishing between affected and healthy samples of SNP data. In this study, conditional mutual information maximisation (CMIM) has been employed in order to identify a subset of the most informative SNPs to be used in with various classifications algorithms for the detection of hypertension disease. Five classification algorithms have been evaluated, namely k-Nearest Neighbours (KNN), Artificial Neural Networks (ANN), Naive Bayes (NB), Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM), along with their combination into an unweighted majority voting ensemble classification scheme. The experimental evaluation of the proposed approach via supervised classification experiments showed that the ensemble approach using the SVM, 5-NN, and NB classifiers achieves the highest classification accuracy (93.21%) and F1 score (91.72%), demonstrating the suitability of the proposed approach for the detection of hypertension disease from SNPs data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.