Background & Aim: In this study, efficient Support Vector Machine (SVM) algorithm for feature selection and classification of multi-category tumour classes of biological samples using gene expression profiles was proposed. Methods: Feature selection interface of the algorithm employed the F-statistic of the ANOVA–like testing scheme at some chosen family-wise-error-rate which ensured efficient detection of false-positive genes. The selected gene subsets using the above method were further screened for optimality using the Misclassification Error Rates yielded by each of them and their combinations in a sequential selection manner. In a 10-fold cross-validation, the optimal values of the SVM parameters with appropriate kernel were determined for tissue sample classification using one-versus-all approach. The entire data matrix was randomly partitioned into 95% training set to train the SVM classifier and 5% test set to evaluate the predictive performance of the classifier over 1,000 Monte-Carlo cross-validation runs. Published microarray breast cancer dataset with five clinical endpoints was employed to validate the results from the simulation studies. Results: Results from Monte-Carlo study showed excellent performance of the SVM classifier with higher prediction accuracy of the tissue samples based on the few gene biomarkers selected by the proposed feature selection method. Conclusion: SVM could be considered as a classification of multi-category tumour classes of biological
Read full abstract