Abstract
Technological advances allow for the measurement of high dimensional data sets with small sample size. When dealing with such high-dimensional data, the consistency of estimations and classification accuracy is called into question. Partial least squares (PLS) scores have traditionally been coupled with linear discriminant analysis, which requires a multivariate normally distributed PLS score. For the classification of high-dimensional data sets, we introduce PLS-NB, a classification strategy that combines PLS with a variant of Naive Bayes (NB). PLS-NB with standard NB, PLS-NB-G with Gaussian(G) kernel NB, PLS-NB-N with non-parametric (N) kernel NB, and PLS-NB-L with Laplace (L) correction are compared to reference approaches PLS coupled with linear discriminate analyses (LDA) and sparse LDA, which are PLS-LDA and SPLS-LDA, respectively, over gene expression data. Cross-validation is used in conjunction with Monte Carlo simulation to avoid over-fitting. The suggested classifier PLS-NB has been validated and calibrated against reference classifiers. PLS-NB-N outperforms when it comes to classifying embryonal cancer with 89.1% accuracy on test data, and it outperforms when it comes to classifying prostate cancer with 92.3% accuracy on test data. The presented method appears to be a viable contender for high-dimensional data classification; its merits can be investigated further, and it can be used to a variety of classification problems.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.