Abstract

Categorization plays a major role in information retrieval. The abstracts of research documents have very few terms for the existing categorization algorithms to provide accurate results. This limitation of the abstracts leads to unsatisfactory categorization. This paper proposed a three-stage categorization scheme to improve the accuracy in categorizing the abstracts of research documents. The abstracts on most cases will be extending the context from the surrounding information. Initially, the context from the environment in which the abstract is present is extracted. The proposed system performs context gathering as a continuous process. In the next stage, the short text is subjected to general NLP techniques. The system divides the terms in the abstract into hierarchical levels of context. The terms contributing to the higher levels of context are taken forward to the further stages in categorization. Finally, the system applies weighted terms method to categorize the abstract. In case of uncertainties arising due to the limited number of terms, the context obtained in the initial stage will be used to eliminate the uncertainty. This relation of the context to the content in the short text will provide better accuracy and lead to effective filtering on content in information retrieval. Experiments conducted on categorization of short texts with the proposed method provided better accuracy than traditional feature-based categorization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call