Abstract
Most of traditional information retrieval and automatic text classification methods with vector space model almost need determine the weighting of the feature terms. Term weighting plays an important role to achieve high performance in information retrieval and text classification. The popular method is using term frequency (tf) and inverse document frequency (idf) for representing importance and computing weighting of terms. But the tf-idf model is not introduced class information, the important information such as title, abstract, conclusion, and the synonymous words information. This paper provides an improved method to compute weighting of the terms. The above information is involved. The experimental results show that the performance is enhanced. The role of important and representative terms is raised and the effect of the unimportant feature term to retrieval and classification is decreased. In addition, the F1 based on new algorithm is higher than based on traditional tf-idf model.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have