Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data

Mingxing Yang,Suhuan Liu,Shuyu Yang,Zhibin Li,Xiumin Li,Xuejun Li,Zhimin Ou,Ming Liu,Vladimir B Bajic

doi:10.1371/journal.pone.0084253

Abstract

MotivationDNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes.ResultsHere we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub’s leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods.

Highlights

DNA microarray analysis is an important tool in medicine and life sciences, because it measures simultaneously the expression levels of thousands of genes
For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes
We developed a novel three-class gene selection method for disease classification and prediction using mOPLS-DA models and S-plots to identify informative genes from microarray data

Summary

Introduction

DNA microarray analysis is an important tool in medicine and life sciences, because it measures simultaneously the expression levels of thousands of genes. Microarray data typically consist of a relatively small sample size (usually several dozens) and a large number of genes (several thousands), most of which may be irrelevant, insignificant, or redundant for disease classification and prediction [15,16]. Many gene-selection approaches for cluster analysis have been proposed such as signal to noise ratio (S2N) [5], ANN [7], Kruskal-Wallis nonparametric one-way ANOVA (KW) [17], ratio of genes between-categories to within-category sums of squares (BW) [18], nonparametric test [19], t-test [20,21], genetic algorithm (GA), and k-nearest neighbor (GA/KNN) [22]. Many approaches were aimed to deal with two-class gene selection problems, and only a few studies involved multiclass gene selection (three classes or more) and classification for cluster analysis

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Dec 30, 2013
Citations: 56	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Comparative analysis of targeted metabolomics: Dominance-based rough set approach versus orthogonal partial least square-discriminant analysis
H Blasco ... R Słowiński
Journal of Biomedical Informatics | VOL. 53
H Blasco, et. al.H Blasco ... R Słowiński
11 Dec 2014
Journal of Biomedical Informatics | VOL. 53

Comparative Analysis of Cultivation Region of Angelica gigas Using a GC-MS-Based Metabolomics Approach
Guibao Jiang ... Jae Yoon Leem
Korean Journal of Medicinal Crop Science | VOL. 24
Guibao Jiang, et. al.Guibao Jiang ... Jae Yoon Leem
30 Apr 2016
Korean Journal of Medicinal Crop Science | VOL. 24

Two Species Origins Comparison of Herba Patriniae Based on Their Ingredients Profile by UPLC-QTOF/MS/MS and Orthogonal Partial Least Squares Discriminant Analysis.
Yonggui Song ... Ming Yang
Chemistry & biodiversity | VOL. 19
Yonggui Song, et. al.Yonggui Song ... Ming Yang
31 Aug 2022
Chemistry & biodiversity | VOL. 19

Discrimination of Different Part of Curcuma longa by HPLC Fingerprints Combined with Multivariate Statistical Analysis
Jiuhua Song ... Kai Shi
Indian Journal of Pharmaceutical Education and Research | VOL. 57
Jiuhua Song, et. al.Jiuhua Song ... Kai Shi
22 Mar 2023
Indian Journal of Pharmaceutical Education and Research | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE