Abstract

To select more effective feature genes, many existing algorithms focus on the selection and study of evaluation methods for feature genes, ignoring the accurate mapping of original information in data processing. Therefore, for solving this problem, a new model is proposed in this paper: rough uncertainty metric model. First, the fuzzy neighborhood granule of the sample is constructed by combining the fuzzy similarity relation with the neighborhood radius in the rough set, and the rough decision is defined by using the fuzzy similarity relation and the decision equivalence class. Then, the fuzzy neighborhood granule and the rough decision are introduced into the conditional entropy, and the rough uncertainty metric model is proposed; in the meantime, the definition of measuring the significance of feature genes and the proof of some related theorems are given. To make this model tolerate noises in data, this paper introduces a variable precision model and discusses the selection of parameters. Finally, based on the rough uncertainty metric model, we design a feature genes selection algorithm and compare it with some existing similar algorithms. The experimental results show that the proposed algorithm can select the smaller feature genes subset with higher classification accuracy and verify that the model proposed in this paper is more effective.

Highlights

  • Nowadays, with the continuous changes of human lifestyle and environment, the incidence and mortality of cancers are rising. erefore, how to improve the analysis, identification, and treatment of tumors has become one of the research hotspots of scholars [1]

  • If the original information of the data can be applied to the calculation as accurately as possible, the result of feature genes selection will be improved to a large extent. e classical rough set proposed by Pawlak [7] has been extensively developed and studied [8, 9]

  • Support vector machine and K-nearest neighbor (KNN, K 3), are used to evaluate the classification accuracy of feature genes subset by 10-fold cross-validation. e comparison process is shown in Figures 1–4

Read more

Summary

Introduction

With the continuous changes of human lifestyle and environment, the incidence and mortality of cancers are rising. erefore, how to improve the analysis, identification, and treatment of tumors has become one of the research hotspots of scholars [1]. Among the large number of genes included in the gene expression profiling data, there are only a few important genes that can be used as information genes to track diseases [5, 6]. E processing of continuous data needs to be discretized, and it will face problems just like information loss. To solve this problem, neighborhood rough set [10,11,12,13] and fuzzy rough set [14,15,16,17,18] are successively proposed as two important models. Neighborhood rough set [10,11,12,13] and fuzzy rough set [14,15,16,17,18] are successively proposed as two important models. e neighborhood rough set can directly process the continuous data, which overcomes the shortcomings of classical rough set, but it cannot accurately describe the fuzziness of samples under the fuzzy background

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call