Abstract

Millions of file uploads and downloads happen every minute resulting in big data creation and manual text categorization is not possible. Hence, there is a need for automatic categorization of documents that makes storage and retrieval more efficient. This research paper proposes a hybrid text categorization model that combines both Rocchio algorithm and Random Forest algorithm to perform Multi-label text categorization. Stop word remover and word stemmer has been used to overcome the limitations in Rocchio Algorithm. Random Forest model takes minimal categories as input to reduce its error rate. Experiments were done on standard text categorization datasets. Our proposed model is found to be more efficient in categorizing the documents when compared with other text categorization models such as fuzzy relevance clustering, ML-KNN (Multi-label KNN) and Naive-Bayes Algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call