Margin Maximization of Text Classification based on Support Vector Machine

Kavyasri G

doi:10.22214/ijraset.2023.57420

Abstract

Abstract: This project focuses on developing a margin maximization model for topic categorization using Support Vector Machines (SVM). Topic categorization aims to classify text documents into predefined topic categories. The proposed model leverages SVM's ability to create a hyperplane with maximum margin between different topics, enhancing the classification performance. The project begins with dataset preparation, where a labeled dataset covering a wide range of topics is gathered. The text data undergoes preprocessing, including cleaning, normalization, and conversion into numerical representations. Feature extraction techniques such as TF-IDF or word embeddings are employed to capture important features related to topics. The dataset is split into training and testing sets, with the training set used to train the SVM model. The SVM model is trained with a focus on maximizing the margin, utilizing techniques like soft-margin SVM or the kernel trick to handle non-linear separable data. Hyperparameters of the SVM model are tuned to optimize its performance. The trained SVM model is then used to predict the topics of the testing data, and its performance is evaluated using standard metrics such as accuracy, precision, recall, and F1 score. The findings of this project demonstrate the effectiveness of the margin maximization model for topic categorization using SVM

Full Text