Abstract

Information Filtering (IF), which has been popularly studied in recent years, is one of the areas that applies document retrieval techniques for dealing with the huge amount of information. In IF systems, modelling user’s interest and filtering relevant documents are major parts of the systems. Various approaches have been proposed for modelling the first component. In this study, we utilized a topic-modelling technique, Latent Dirichlet Topic Modelling, to model user’s interest for IFs. In particular, an extended model of it to represent user’s interest named Latent Dirichlet Topic Modelling with high Frequency Occurrences, shorted as LDA_HF, was proposed with the intention to enhance retrieving performance of IFs. The new model was then compared to the existing methods in modelling user’s interest such as BM25, pLSA, and LDA_IF over the big benchmark datasets, RCV1 and R8. The results of extensive experiments showed that the new proposed model outperformed all the state-of-the-art baseline models in user modelling such as BM25, pLSA and LDA_IF according to 4 major measurement metrics including Top20, B/P, MAP, and F1. Hence, the model LDA_HF promises one of the reliable methods of enhancing performance of IFs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call