Abstract

User interest modeling is the core of personalized services. It is applied in the fields of information retrieval, data mining, e-commerce and personalized recommendation to improve the quality of information services. Most of traditional user interest models are built on VSM using keywords as the user interest. However, these models not only ignore the hierarchical granularity relations between keywords, but also ignore the use of domain knowledge hidden the specific concepts of users or the topics of interests. Thus, it is difficult to express the user interests accurately and reasonably in the user interest modeling. Motivated by this, we propose a Graph-based Chinese Phrases Hierarchical Clustering algorithm called GCPHC. It organizes the user interest in a hierarchy tree structure, designs the HowNet-based Maximum Matching Mapping method called HNM3 to map the user interest to topics of ODP, and builds a hierarchical user interest model labeled with the topic for each cluster. To achieve the optimal performance of our algorithm, we take into account of five correlation functions (including AEMI, AEMI3, IT, PS and Support) used in our GCPHC algorithm in cases varying with the data scale and the POS (part of speech). Extensive experimental studies demonstrate that our algorithm with the correlation function AEMI performs as well as that with AEMI3, and outperforms others in the cases with the data scale varying from 20 documents to 30 documents and nouns as terms. In these cases, the average RGC (Rate of Good Clusters) in our algorithm with the correlation function AEMI amounts to 74.7%, which is superior to our algorithm with other correlation functions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.