Abstract
Classification of High Dimension Low Sample Size (HDLSS) datasets is a challenging task in supervised learning. Such datasets are prevalent in various areas including biomedical applications and business analytics. In this paper, a new embedded feature selection method for HDLSS datasets is introduced by incorporating sparsity in Proximal Support Vector Machines (PSVMs). Our method, called Sparse Proximal Support Vector Machines (sPSVMs), learns a sparse representation of PSVMs by first casting it as an equivalent least squares problem and then introducing the l1-norm for sparsity. An efficient algorithm based on alternating optimization techniques is proposed. sPSVMs remove more than 98% of features in many high dimensional datasets without compromising on generalization performance. Stability in the feature selection process of sPSVMs is also studied and compared with other univariate filter techniques. Additionally, sPSVMs offer the advantage of interpreting the selected features in the context of the classes by inducing class-specific local sparsity instead of global sparsity like other embedded methods. sPSVMs appear to be robust with respect to data dimensionality. Moreover, sPSVMs are able to perform feature selection and classification in one step, eliminating the need for dimensionality reduction on the data. To that end, sPSVMs can be used for preprocessing free classification tasks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.