Abstract

There are several methods used for the classification problems. There are many different kinds of fields that can be used. Nowadays, Support Vector Machine (SVM) is a popular classification method that has been proposed by many researchers. Using the same method but different distribution methods for creating training and testing data in the same dataset can yield varying results in terms of prediction accuracy, which is crucial in classification. In this paper, we compare the prediction accuracy between SVM results and Logistic Regression results to determine the better method to classify the current condition of the patient after undergoing some treatment. Several treatments are used in this paper, including feature selection, feature extraction, separating the train and testing data using Holdout and K-Fold CV. Stepwise selection is done to reduce the features. Training and testing dataset is obtained using the five stratified and non-stratified holdout and five fold stratified and non-stratified cross validation. The result shows that the best method to classify the cancer dataset is five fold stratified cross validation SVM with radial kernel. The obtained accuracy is 81,816% with variance as much as 0,94%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.