Abstract

Because of the high cost of label collection, people are now faced with a large number of partially labeled gene expression data (plge-data). Single cell RNA-seq data (scrs-data) are a kind of important plge-data and reflect the abundance of gene transcript mRNA measured directly or indirectly in cells. For convenience, a decision information system (DIS) based on scrs-data is called a single cell gene decision space (scgd-space). Due to the high dimensionality of scrs-data, feature selection must be done before clustering and classification. The existing feature selection methods based on equivalence relation are ineffective for the scgd-space owing to the strictness of equality between information values. To solve the above problems, this paper studies the uncertainty measurement of the scgd-space based on class-consistent technology and considers its application to semi-supervised gene selection. Class-consistent technology replaces equality with approximate equality between two expression values at a gene. Based on the proposed technology, class-consistent and non-class-consistent relations on the cell set of the scgd-space are established first. Then, the scgd-space (O,A,d) is divided into labeled space (Ol,A,d) and unlabeled space (Ou,A,d). Next, four metrics of importance on each gene subset of (O,A,d) are defined. They are the weighted sum of (Ol,A,d) and (Ou,A,d) determined by the missing rate of labels and the established relations and can be used to measure the uncertainty of (O,A,d). In addition, as an application of four metrics to the scgd-space, a semi-supervised gene selection algorithm is designed. Finally, the experimental results and statistical tests on 16 large-scale scrs-data sets show that the defined metrics can effectively measure the uncertainty of the scgd-space. The designed algorithm with a high reduction rate outperforms some state-of-the-art feature selection algorithms in terms of eight performance evaluation indicators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call