Abstract

Genetics play a prominent role in the development and progression of malignant neoplasms. Identification of the relevant genes is a high-dimensional data processing problem. Pyramid gravitational search algorithm (PGSA), a hybrid method in which the number of genes is cyclically reduced is proposed to conquer the curse of dimensionality. PGSA consists of two elements, a filter and a wrapper method (inspired by the gravitational search algorithm) which iterates through cycles. The genes selected in each cycle are passed on to the subsequent cycles to further reduce the dimension. PGSA tries to maximize the classification accuracy using the most informative genes while reducing the number of genes. Results are reported on a multi-class microarray gene expression dataset for breast cancer. Several feature selection algorithms have been implemented to have a fair comparison. The PGSA ranked first in terms of accuracy (84.5%) with 73 genes. To check if the selected genes are meaningful in terms of patient’s survival and response to therapy, protein-protein interaction network analysis has been applied on the genes. An interesting pattern was emerged when examining the genetic network. HSP90AA1, PTK2 and SRC genes were amongst the top-rated bottleneck genes, and DNA damage, cell adhesion and migration pathways are highly enriched in the network.

Highlights

  • Classification of high-dimensional microarray gene expression data is a major problem in bioinformatics

  • In the course of this work, we discovered that heat shock protein 90-alpha (HSP90AA1) is the most highlighted gene in the network and based on the available data, HSP90AA1 is an evolutionary conserved protein which has a prominent role in processes such as DNA damage, inflammation and tumorigenesis. there is a considerable body of evidence that shows plasma levels of HSP90AA1 has clinical benefit in prediction of onset and risk of metastasis in breast cancer patients [51]

  • In comparison to genetic algorithm (GA), particle swarm optimization (PSO) and imperialistic competitive algorithm (ICA), Pyramid gravitational search algorithm (PGSA) could reach to a lower number of genes while achieving an accuracy of 84.5 percent

Read more

Summary

Introduction

Classification of high-dimensional microarray gene expression data is a major problem in bioinformatics. Wrapper methods work jointly with a classifier and try to find the features with maximum classification accuracy. Feature selection problems are NP-hard; So, heuristic random search algorithms are a suitable proposition. They could find the sub-optimal solutions in complicated and large problems, and in some cases, they are more accurate and applicable than filter-based methods. Inspired by a random search algorithm, these methods try to select the best subset of features. In feature selection using heuristic search algorithms, the goal is maximizing classification accuracy [6]. A pyramid version of GSA is used for solving high-dimensional gene selection problems. The proposed method is a hybrid approach that cyclically reduces the number of genes and selects the least genes for achieving high classification accuracy.

Previous works
Materials and methods
The proposed method
Gene reduction pyramidically using IBGSA
Model evaluation
Experimental data
Benchmark algorithms
Workflow of feature selection
Feature selection benchmark
Network analysis
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call