Abstract

DNA microarray technology can monitor thousands of genes in a single experiment. One important application of this high-throughput gene expression data is to classify samples into known categories. Since the number of gene often exceeds the number of samples, classical classification methods do not work well under this circumstance. Furthermore, there are many irrelevant and redundant genes which will decrease classification accuracy, thus a gene selection process is necessary. More accurate classification result using these selected genes is expected. A novel informative gene selection and sample classification method for gene expression data is proposed in this paper. This method is based on Linear Discriminant Analysis (LDA) in the regular space and the null space of within-class scatter matrix. By recursively filtering genes which have smaller coefficient in the optimal projection basis vectors, the remaining genes are more and more informative. The results of experiments on leukemia dataset and the colon dataset show that genes in this subset have much less correlations and more discriminative power compared to those selected by classical methods.KeywordsAcute Myeloid LeukemiaAcute Lymphoblastic LeukemiaLinear Discriminant AnalysisNull SpaceGene SelectionThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call