Abstract

In this paper, we propose a novel one-class classification approach for text document classification using One-Class Support Vector Machine (OCSVM) and Latent Semantic Indexing (LSI) in tandem. We first apply t-statistic-based feature selection on the text corpus. Then, we apply OCSVM on the rows corresponding to the negative class of the document-term matrix of a collection of text documents and extract the Support Vectors (SV). Then, in the test phase, we employ LSI on the query documents from the positive class to compare them with the SVs extracted from the negative class and match score is computed using the cosine similarity measure. Then, based on a prespecified threshold for the match score, we classify the positive category of the text corpus. Use of SV for comparison reduces the computational load, which is the main contribution of the paper. We demonstrated the effectiveness of our approach on the datasets pertaining to Phishing, and sentiment analysis in a bank.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.