Abstract

Cancer diagnosis based on gene analysis is one of the main research areas in bioinformatics and machine learning. Microarray is a technology that can simultaneously study the expression level of thousands of genes in a sample. However, mutation or change in gene expression of only a small number of genes can lead to cancer, and basically, the expression level of most genes is the same between cancerous and healthy samples. On the other hand, the main challenge in microarray data is the high number of genes compared to the very small number of samples. This issue makes gene selection an essential step in microarray analysis. In this paper, we have proposed a new two-phase gene selection method for microarray data. In the first stage of this method, with a different approach, the genes that are the main features of the microarray are considered as training samples instead of cancerous and healthy samples; afterward, we reduce the number of genes to a great extent via anomaly detection. In the second stage, we apply a guided genetic algorithm to the genes obtained from the previous step to reach the final effective genes. Based on the experimental results, our method can reduce the number of genes up to at least 99% on all datasets. Besides, in addition to the very high reduction rate of genes, we managed to significantly increase the classification accuracy using the selected genes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.