A Novel Method for Transforming XML Documents to Time Series and Clustering Them Based on Delaunay Triangulation

Narges Shafieian

doi:10.4236/am.2015.66098

Abstract

Nowadays exchanging data in XML format become more popular and have widespread application because of simple maintenance and transferring nature of XML documents. So, accelerating search within such a document ensures search engine’s efficiency. In this paper, we propose a technique for detecting the similarity in the structure of XML documents; in the following, we would cluster this document with Delaunay Triangulation method. The technique is based on the idea of representing the structure of an XML document as a time series in which each occurrence of a tag corresponds to a given impulse. So we could use Discrete Fourier Transform as a simple method to analyze these signals in frequency domain and make similarity matrices through a kind of distance measurement, in order to group them into clusters. We exploited Delaunay Triangulation as a clustering method to cluster the d-dimension points of XML documents. The results show a significant efficiency and accuracy in front of common methods.

Highlights

The main idea of this method is based on structure of XML documents; it means that, tags and position of elements in XML tree’s hierarchy are considerable
The main contribution of our approach is these steps: 1) Mapping each documents to a time series; 2) Getting Discrete Fourier Transform (DFT) and transforming each time series from time domain to frequency domain; 3) Mapping the signals related to each documents to a point in d-dimensional space; 4) Triangulation of points related to documents; 5) Clustering documents based on their triangulation
We use two external metrics named F-Measure and Purity as evaluator of our method

Summary

Introduction

The main idea of this method is based on structure of XML documents; it means that, tags and position of elements in XML tree’s hierarchy are considerable. We use two external metrics named F-Measure and Purity as evaluator of our method. More information about this method is mentioned in [1]. The corpus of documents for evaluating this method is a standard corpus, which a part of that is applied This corpus has clustering metric itself which we use it as a comparison versus our external metrics. The rest of the paper is organized as follows: In Section 2, we present some information about common methods for detecting similarities and clustering documents.

Related Work Summary

Implements Requirements and Performing

Mapping Each Documents to a Time Series

Triangulate Points Corresponding Documents

Clustering Documents Based on Their Triangulation

Clustering Evaluation’s Parameters and Notifications

Experimental Results

Conclusions and Future Works

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Mathematics	Publication Date: Jan 1, 2015
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Novel Method for Transforming XML Documents to Time Series and Clustering Them Based on Delaunay Triangulation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Mathematics

Lead the way for us

Similar Papers

Updating multivariate calibration with the Delaunay triangulation method: The creation of a new local model
L Jin ... D.L Massart
Chemometrics and Intelligent Laboratory Systems | VOL. 80
L Jin, et. al.L Jin ... D.L Massart
08 Sep 2005
Chemometrics and Intelligent Laboratory Systems | VOL. 80

Multivariate Calibration with the Delaunay Triangulation Method: Definition of the Calibration Domain
L Jin ... Q S Xu
Spectroscopy Letters | VOL. 38
L Jin, et. al.L Jin ... Q S Xu
01 Nov 2005
Spectroscopy Letters | VOL. 38

Delaunay triangulation method for multivariate calibration
L Jin ... D.L Massart
Analytica Chimica Acta | VOL. 488
L Jin, et. al.L Jin ... D.L Massart
20 Jun 2003
Analytica Chimica Acta | VOL. 488

Integrating time signals in frequency domain – Comparison with time domain integration
Anders Brandt ... Rune Brincker
Measurement | VOL. 58
Anders Brandt, et. al.Anders Brandt ... Rune Brincker
16 Sep 2014
Measurement | VOL. 58

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Method for Transforming XML Documents to Time Series and Clustering Them Based on Delaunay Triangulation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Mathematics