Abstract

Technological advances allow for the measurement of high dimensional data sets with small sample size. When dealing with such high-dimensional data, the consistency of estimations and classification accuracy is called into question. Partial least squares (PLS) scores have traditionally been coupled with linear discriminant analysis, which requires a multivariate normally distributed PLS score. For the classification of high-dimensional data sets, we introduce PLS-NB, a classification strategy that combines PLS with a variant of Naive Bayes (NB). PLS-NB with standard NB, PLS-NB-G with Gaussian(G) kernel NB, PLS-NB-N with non-parametric (N) kernel NB, and PLS-NB-L with Laplace (L) correction are compared to reference approaches PLS coupled with linear discriminate analyses (LDA) and sparse LDA, which are PLS-LDA and SPLS-LDA, respectively, over gene expression data. Cross-validation is used in conjunction with Monte Carlo simulation to avoid over-fitting. The suggested classifier PLS-NB has been validated and calibrated against reference classifiers. PLS-NB-N outperforms when it comes to classifying embryonal cancer with 89.1% accuracy on test data, and it outperforms when it comes to classifying prostate cancer with 92.3% accuracy on test data. The presented method appears to be a viable contender for high-dimensional data classification; its merits can be investigated further, and it can be used to a variety of classification problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call