One-Class Text Document Classification with OCSVM and LSI

B Shravan Kumar,Vadlamani Ravi

doi:10.1007/978-981-10-3174-8_50

Abstract

In this paper, we propose a novel one-class classification approach for text document classification using One-Class Support Vector Machine (OCSVM) and Latent Semantic Indexing (LSI) in tandem. We first apply t-statistic-based feature selection on the text corpus. Then, we apply OCSVM on the rows corresponding to the negative class of the document-term matrix of a collection of text documents and extract the Support Vectors (SV). Then, in the test phase, we employ LSI on the query documents from the positive class to compare them with the SVs extracted from the negative class and match score is computed using the cosine similarity measure. Then, based on a prespecified threshold for the match score, we classify the positive category of the text corpus. Use of SV for comparison reduces the computational load, which is the main contribution of the paper. We demonstrated the effectiveness of our approach on the datasets pertaining to Phishing, and sentiment analysis in a bank.

Full Text