순차패턴에 기반한 XML 문서 클러스터링

Jeong-Hee Hwang ,Keun-Ho Ryu

doi:10.3745/kipstd.2003.10d.7.1093

Abstract

As the use of internet is growing, the amount of information is increasing rapidly and XML that is a standard of the web data has the property of flexibility of data representation. Therefore electronic document systems based on web, such as EDMS (Electronic Document Management System), ebXML (e-business extensible Markup Language), have been adopting XML as the method for exchange and standard of documents. So research on the method which can manage and search structural XML documents in an effective wav is required. In this paper we propose the clustering method based on structural similarity among the many XML documents, using typical structures extracted from each document by sequential pattern mining in pre-clustering process. The proposed algorithm improves the accuracy of clustering by computing cost considering cluster cohesion and inter-cluster similarity.

Full Text