Abstract

In the area of bioinformatics, the identification of gene subsets responsible for classifying available samples to two or more classes (for example, classes being 'malignant' or 'benign') is an important task. The main difficulties in solving the resulting optimization problem are the availability of only a few samples compared to the number of genes in the samples and the exorbitantly large search space of solutions. Although there exist a few applications of evolutionary algorithms (EAs) for this task, we treat the problem as a multi-objective optimization problem of minimizing the gene subset size and simultaneous minimizing the number of misclassified samples. Contrary to the past studies, we have discovered that a small gene subset size (such as four or five) is enough to correctly classify 100% or near 100% samples for three cancer samples (Leukemia, Lymphoma, and Colon). Besides a few variants of NSGA-II, in one implementation NSGA-II is modified to find multi-modal non-dominated solutions discovering as many as 630 different three-gene combinations providing a 100% correct classification to the Leukemia data. In order to perform the identification task with more confidence, we have also introduced a threshold in the prediction strength. All simulation results show consistent gene subset identifications on three disease samples and exhibit the flexibilities and efficacies in using a multi-objective EA for the gene identification task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.