Abstract

BackgroundThe most fundamental task using gene expression data in clinical oncology is to classify tissue samples according to their gene expression levels. Compared with traditional pattern classifications, gene expression-based data classification is typically characterized by high dimensionality and small sample size, which make the task quite challenging.ResultsIn this paper, we present a modified K-nearest-neighbor (KNN) scheme, which is based on learning an adaptive distance metric in the data space, for cancer classification using microarray data. The distance metric, derived from the procedure of a data-dependent kernel optimization, can substantially increase the class separability of the data and, consequently, lead to a significant improvement in the performance of the KNN classifier. Intensive experiments show that the performance of the proposed kernel-based KNN scheme is competitive to those of some sophisticated classifiers such as support vector machines (SVMs) and the uncorrelated linear discriminant analysis (ULDA) in classifying the gene expression data.ConclusionA novel distance metric is developed and incorporated into the KNN scheme for cancer classification. This metric can substantially increase the class separability of the data in the feature space and, hence, lead to a significant improvement in the performance of the KNN classifier.

Highlights

  • As an important application of this novel technology, the gene expression data are used to determine and predict the state of tissue samples, which has shown to be very helpful in clinical oncology

  • The most fundamental task using gene expression data in clinical oncology is to classify tissue samples according to their gene expression levels

  • Compared with traditional pattern classifications, gene expression-based data classification is typically characterized by high dimensionality and small sample size, which make the task quite challenging

Read more

Summary

Introduction

As an important application of this novel technology, the gene expression data are used to determine and predict the state of tissue samples, which has shown to be very helpful in clinical oncology. The most fundamental task using gene expression data in clinical oncology is to classify tissue samples according to their gene expression levels. In combination with pattern classification techniques, gene expression data can provide more reliable means to diagnose and predict various types of cancers than the traditional clinical methods. A number of methods have been applied or developed to classify microarray data [1,2,3,4,5,6]. These methods include Knearest-neighbor (KNN), boosting, linear discriminant (page number not for citation purposes)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.