Abstract

Identifying the key genes related to tumors from gene expression data with a large number of features is important for the accurate classification of tumors and to make special treatment decisions. In recent years, unsupervised feature selection algorithms have attracted considerable attention in the field of gene selection as they can find the most discriminating subsets of genes, namely the potential information in biological data. Recent research also shows that maintaining the important structure of data is necessary for gene selection. However, most current feature selection methods merely capture the local structure of the original data while ignoring the importance of the global structure of the original data. We believe that the global structure and local structure of the original data are equally important, and so the selected genes should maintain the essential structure of the original data as far as possible. In this paper, we propose a new, adaptive, unsupervised feature selection scheme which not only reconstructs high-dimensional data into a low-dimensional space with the constraint of feature distance invariance but also employs -norm to enable a matrix with the ability to perform gene selection embedding into the local manifold structure-learning framework. Moreover, an effective algorithm is developed to solve the optimization problem based on the proposed scheme. Comparative experiments with some classical schemes on real tumor datasets demonstrate the effectiveness of the proposed method.

Highlights

  • Published: 23 May 2021Cancers are responsible for the majority of global deaths and are expected to rank as the leading cause of death

  • We evaluated the performance of our approach regarding feature selection using comparison experiments with several typical feature selection methods: URAFS, UDFS, SPEC, negative discriminant feature selection algorithm (NDFS), LLCFS and joint lowdimensional embedded learning and sparse regression (JELSR)

  • We present an adaptive, unsupervised feature algorithm that combines gene selection and structure learning into a unified framework of sparse representation

Read more

Summary

Introduction

Published: 23 May 2021Cancers are responsible for the majority of global deaths and are expected to rank as the leading cause of death. In the treatment of cancers, the correct diagnosis of the type and nature of tumors at as early a stage as possible is conducive to increased efficacy [2]. The development of DNA microarray technology has made it possible to study the causes of cancers from the level of genes, which greatly improves the accuracy of diagnosis and the curative effect related to cancer. DNA microarray data are usually high-dimensional, with the number of genes in a sample often running into thousands or even tens of thousands, there are often only a few key genes that determine specific tumors [3]. Selecting the important genes related to cancer classification from the original huge number of genes is one of the key research areas with respect to gene data classification

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call