Abstract

This paper describes a new hypergraph formulation for document categorization, where hyperclique patterns, strongly affiliated documents in this case, are used as hyperedges. Compared to frequent itemsets, the objects in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine or Jaccard similarity measure. Since hypergraph partitioning is mainly based on vertex similairty on the hyperedge, hypercliques may serve as better quality hyperedges. Besides, due to the additional confidence constraint, we can cover more items in the mined patterns while keep the pattern size reasonable. Hence, the difficulty in partitioning dense hypergraphs, which is often encountered in frequent itemset based hypergraph partitioning, is alleviated considerably. Finally, experiments with real-world datasets show that, with hyperclique patterns as hyperedges, we can improve the clustering results in terms of various external validation measures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call