Abstract
The intention expresses the user’s preference for document structure division. Intention-guided document structure division is an important task in the field of text mining. To achieve this goal, deep semi-supervised document clustering provides a promising solution to personalized document clustering. However, traditional deep semi-supervised clustering models suffer from the problem of the limited number of constraints which is insufficient for intention-guided document clustering. Moreover, documents normally have various emphases on their representations to reflect different structural opinions. In this paper, we proposed an intention-guided deep semi-supervised document clustering model, namely IGSC, to divide document structure based on a small amount of user-provided supervised information. IGSC designs a deep metric learning network to solve the above problems. The deep metric learner explores the user’s global intention and outputs an intention matrix. The intention is explored from the small amount user provided pairwise constraints and is used to guide the representation learning. Moreover, IGSC uses the intention matrix to guide the clustering process, to get the clustering results that best meet the user’s intention. This paper compares IGSC with a number of document clustering models on four real-world text datasets, namely Reu-10k, BBC, ACM, and Abstract. The results show that IGSC evidently improves the clustering performance and outperforms the best result of benchmark models with 7% on average. The comparison with other models and the visualization results can demonstrate that IGSC is effective.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of King Saud University - Computer and Information Sciences
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.