Twin support vector machines (TWSVMs) have been shown to be effective classifiers for a range of pattern classification tasks. However, the TWSVM formulation suffers from a range of shortcomings: (i) TWSVM uses hinge loss function which renders it sensitive to dataset outliers (noise sensitivity). (ii) It requires a matrix inversion calculation in the Wolfe-dual formulation which is intractable for datasets with large numbers of features/samples. (iii) TWSVM minimizes the empirical risk instead of the structural risk in its formulation with the consequent risk of overfitting. This paper proposes a novel large scale pinball twin support vector machines (LPTWSVM) to address these shortcomings. The proposed LPTWSVM model firstly utilizes the pinball loss function to achieve a high level of noise insensitivity, especially in relation to data with substantial feature noise. Secondly, and most significantly, the proposed LPTWSVM formulation eliminates the need to calculate inverse matrices in the dual problem (which apart from being very computationally demanding may not be possible due to matrix singularity). Further, LPTWSVM does not employ kernel-generated surfaces for the non-linear case, instead using the kernel trick directly; this ensures that the proposed LPTWSVM is a fully modular kernel approach in contrast to the original TWSVM. Lastly, structural risk is explicitly minimized in LPTWSVM with consequent improvement in classification accuracy (we explicitly analyze the properties of classification accuracy and noise insensitivity of the proposed LPTWSVM). Experiments on benchmark datasets show that the proposed LPTWSVM model may be effectively deployed on large datasets and that it exhibits similar or better performance on most datasets in comparison to relevant baseline methods.
Read full abstract