Abstract

Text classification may be viewed as assigning texts in a predefined set of categories. However there are many digital documents that are not organized according to their contents. So it is difficult task to find relevant documents for a user. Automatic text classification problem can solve this problem. In this paper we introduce a new random walk term weighting method for improved text classification. In our approach to weight a term, we exploit the relationship of local (term position, term frequency) and global (inverse document frequency, information gain) information of terms (vertices). Moreover, we weight terms by considering co-occurrence and semantic relation of terms as a measure of dependency. To evaluate our term weighting approach we integrate it in Rocchio text classification algorithm and experimental results show that our method performs better than other random walk models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call