Abstract

In bioinformatics, the rapid development of gene sequencing technology has produced an increasing amount of microarray data. This type of data shares the typical characteristics of small sample size and high feature dimensions. Searching for biomarkers from microarray data, which expression features of various diseases, is essential for the disease classification. feature selection has therefore became fundemental for the analysis of microarray data, which designs to remove irrelevant and redundant features. There are a large number of redundant features and irrelevant features in microarray data, which severely degrade the classification effectiveness. We propose an innovative feature selection method with the goal of obtaining feature dependencies from a priori knowledge and removing redundant features using spectral clustering. In this paper, the graph structure is firstly constructed by using the gene interaction network as a priori knowledge, and then a link prediction method based on graph neural network is proposed to enhance the graph structure data. Finally, a feature selection method based on spectral clustering is proposed to determine biomarkers. The classification accuracy on DLBCL and Prostate can be improved by 10.90% and 16.22% compared to traditional methods. Link prediction provides an average classification accuracy improvement of 1.96% and 1.31%, and is up to 16.98% higher than the published method. The results show that the proposed method can have full use of a priori knowledge to effectively select disease prediction biomarkers with high classification accuracy.

Highlights

  • In bioinformatics, the rapid development of gene sequencing technology has produced an increasing amount of microarray data

  • Effective gene selection can significantly enhance the disease prediction and diagnosis process, It has been extensively studied in cancer pathogenesis and pharmacology

  • To further mine the information of graph structure data and to solve the above problems, we proposed a link prediction technology based on graph neural network to achieve the improvement of gene network, using spectral clustering method combined with feature selection technology to achieve the determination of biomarkers, and the experimental results proved the effectiveness and advancement of this method

Read more

Summary

Introduction

The rapid development of gene sequencing technology has produced an increasing amount of microarray data This type of data shares the typical characteristics of small sample size and high feature dimensions. There are a large number of redundant features and irrelevant features in microarray data, which severely degrade the classification effectiveness. The results show that the proposed method can have full use of a priori knowledge to effectively select disease prediction biomarkers with high classification accuracy. The conventional pattern recognition methods are not suitable for the data with high dimension and few s­ amples[1] For such data, how to remove redundant features, and mine the useful biological information hidden in the massive data has become the key to the research of recognition. The literature only uses IntScore to deal with protein dependence and does not evaluate potential feature dependence

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call