Abstract
BackgroundAccurate classification of microarray data is critical for successful clinical diagnosis and treatment. The "curse of dimensionality" problem and noise in the data, however, undermines the performance of many algorithms.MethodIn order to obtain a robust classifier, a novel Additive Nonparametric Margin Maximum for Case-Based Reasoning (ANMM4CBR) method is proposed in this article. ANMM4CBR employs a case-based reasoning (CBR) method for classification. CBR is a suitable paradigm for microarray analysis, where the rules that define the domain knowledge are difficult to obtain because usually only a small number of training samples are available. Moreover, in order to select the most informative genes, we propose to perform feature selection via additively optimizing a nonparametric margin maximum criterion, which is defined based on gene pre-selection and sample clustering. Our feature selection method is very robust to noise in the data.ResultsThe effectiveness of our method is demonstrated on both simulated and real data sets. We show that the ANMM4CBR method performs better than some state-of-the-art methods such as support vector machine (SVM) and k nearest neighbor (kNN), especially when the data contains a high level of noise.AvailabilityThe source code is attached as an additional file of this paper.
Highlights
Accurate classification of microarray data is critical for successful clinical diagnosis and treatment
We show that the ANMM4CBR method performs better than some state-of-the-art methods such as support vector machine (SVM) and k nearest neighbor, especially when the data contains a high level of noise
We carried out experiments using simulated data as well as real microarray data to test the performance of ANMM4CBR
Summary
Accurate classification of microarray data is critical for successful clinical diagnosis and treatment. Two typical problems that researches want to solve using microarray data are: (1) discovering informative genes for classification based on different cell-types or diseases [1]; (2) clustering and arranging genes according to their similarity in expression patterns [2]. Many commonly used classifiers are rule-based or statistical-based. One challenge of these methods on microarray data is the small sample size problem. With the limited number of training samples, it is difficult to obtain domain knowledge for rule-based systems or get accurate parameters (such as mean value and standard deviation) for statistical-based approaches
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.