Abstract

Due to the increase in electronic documents, automatic text classification has gained a lot of importance as manual classification of documents is time-consuming. Machine learning is the main approach for automatic text classification, where texts are represented, terms are weighted on the basis of the chosen representation and a classification model is built. Vector space model is the dominant text representation largely due to its simplicity. Graphs are becoming an alternative text representation that have the ability to capture important information in text such as term order, term co-occurrence and term relationships that are not considered by the vector space model. Substantially better text classification performance has been demonstrated for term weighting schemes which use a graph representation. In this paper, we introduce a graph-based term weighting scheme, tw-srw, which is an effective supervised term weighting method that considers the co-occurrence information in text for increasing text classification accuracy. Experimental results show that it outperforms the state-of-the-art unsupervised term weighting schemes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call