Abstract

Classification studies are widely applied in many areas of research. In our study, we are using classification analysis to explore approaches for tackling the classification problem for a large number of measures using partial least square discriminant analysis (PLS-DA) and decision trees (DT). The performance for both methods was compared using a sample data of breast tissues from the University of Wisconsin Hospital. A partial least square discriminant analysis (PLS-DA) and decision trees (DT) predict the diagnosis of breast tissues (M = malignant, B = benign). A total of 699 patients diagnose (458 benign and 241 malignant) are used in this study. The performance of PLS-DA and DT has been evaluated based on the misclassification error and accuracy rate. The results show PLS-DA can be considered as a good and reliable technique to be used when dealing with a large dataset for the classification task and have good prediction accuracy.

Highlights

  • In multivariate classification, the aim of this method is finding the mathematical model who is able to identify the membership or grouping for each sample according to their appropriate class and the basis of a set of measurements

  • The aim of this study is to investigate the performance of two different classification methods using PLS-discriminant analysis and Decision tree analysis for predicting the diagnosis of breast tissues

  • The performance of partial least square discriminant analysis (PLS-DA) and decision trees (DT) has been evaluated based on the misclassification error rate and the percentage of testing samples that are correctly classified by the model evaluated by the accuracy rate

Read more

Summary

Introduction

The aim of this method is finding the mathematical model who is able to identify the membership or grouping for each sample according to their appropriate class and the basis of a set of measurements. The aim of this study is to investigate the performance of two different classification methods using PLS-discriminant analysis and Decision tree analysis for predicting the diagnosis of breast tissues. The decision tree [9] is the most important technique in classification problems of breast cancer database and medical field. The simulation results assure that the priority-based decision tree algorithm is for SEER breast cancer dataset [11]. Velmurugan [3], discussed several algorithms such as C4.5, ID3, and CART (Classification and Regression Trees) to classify the data using decision trees. The decision tree gives a powerful technique for classification and prediction in Breast Cancer diagnosis problem [14]. This study to investigate the performance of PLS-DA and decision tree to evaluate large dataset for predicting the diagnosis of breast tissues

Materials and Methods
Construction of PLS-DA
Step for built PLS-DA model
R-coding for PLS-DA
Construction of Decision Trees
R-coding for DT
Result and Analysis
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call