Abstract
Clustering XML documents by structure has been, generally, accomplished by looking at the occurrence of one pre-established type of structural component in the structures of the XML documents. It is likely that focusing only on one type of structural component may produce clusters with a certain extent of inner structural inhomogeneity, because of uncaught differences in the structures of the XML documents or for an inappropriate choice of structural component. To overcome these limitations, a new parameter-free approach to clustering XML document is proposed, that allows to consider simultaneously multiple types of structural components to isolate structurally-homogeneous clusters of XML documents. The idea behind the approach is to represent each XML document as a transaction of boolean feature, enlightening of suitable selection of its structural components. A parameter-free clustering scheme is, then, used to isolate structural homogeneous clusters. A comparative evaluation over both real and synthetic XML data provides evidence of effectiveness and efficacy of the devised approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.