Abstract

Gene selection is to detect the most significantly expressed genes under different conditions expression data. The current challenge in gene selection is the comparison of a large number of genes with limited patient samples. Thus it is trivial task in simple statistical analysis. Various statistical measurements are adopted by filter methods applied in gene selection studies. Their ability to discriminate phenotypes is crucial in classification and selection. Here we describe the standard deviation error distribution (SDED) method for gene selection. It utilizes variations within-class and among-class in gene expression data. We tested the method using 4 leukemia datasets available in the public domain. The method was compared with the GS2 and CHO methods. The Prediction accuracies by SDED are better than both GS2 and CHO for different datasets. These are 0.8-4.2% and 1.6-8.4% more that in GS2 and CHO. The related OMIM annotations and KEGG pathways analyses verified that SDED can pick out more 4.0% and 6.1% genes with biological significance than GS2 and CHO, respectively.

Highlights

  • DNA micro-array technology has enabled biologists to associate phenotypes with molecular genetics [1, 2]

  • acute lymphoblastic leukaemia (ALL)-acute myeloid leukaemia (AML) dataset: The ALL-AML dataset is obtained from the cancer program of BROAD Institute [13]

  • It consists of 7129 gene expression profiles of two acute cases of leukaemia: (1) acute lymphoblastic leukaemia (ALL, 47 samples) and (2) acute myeloblastic leukaemia (AML, 25 samples)

Read more

Summary

Background

DNA micro-array technology has enabled biologists to associate phenotypes with molecular genetics [1, 2]. Genes with significant expression across the sample set are selected using sound statistical techniques These discriminatory genes will help to classify different cancer subtypes [3, 4]. This is done for all samples and for SVM is a powerful and popular machine-learning every top ranked x (from 1 to 100 with p < 0.01) genes method and has been widely used in biological in the datasets. The one class has the same maximum vote, the classifier performance of SDED method (98.387%/96) was only will have to make a random prediction. It is known comparable with GS2 (97.581%/68) and CHO that proper selection of parameter is very important for (96.774%/87) in ALL dataset. The numbers of genes in the dataset that are found in OMIM (Online Mendelian Inheritance in Man) and

Conclusion:
Findings
Acknowledgement:
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.