Comparing Clustering Algorithms for the Identification of Similar Pages in Web Applications

Andrea De Lucia,Giuseppe Scanniello,Genoveffa Tortora,Michele Risi

doi:10.1007/978-3-540-73597-7_34

Comparing Clustering Algorithms for the Identification of Similar Pages in Web Applications

Andrea De Lucia, Giuseppe Scanniello + Show 2 more

Open Access

PDF Available

https://doi.org/10.1007/978-3-540-73597-7_34

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2007

Citations: 15

Affiliation: University of Salerno, University of Basilicata

#Pages In Web Applications #Clustering Algorithm #Agglomerative Hierarchical Clustering Algorithm #Winner Takes All #Divisive Clustering Algorithm #Partitional Clustering Algorithm #Web Applications #Partitional Clustering #Partitional Algorithm #Structures Of Pages

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

In this paper, we analyze some widely employed clustering algorithms to identify duplicated or cloned pages in web applications. Indeed, we consider an agglomerative hierarchical clustering algorithm, a divisive clustering algorithm, k-means partitional clustering algorithm, and a partitional competitive clustering algorithm, namely Winner Takes All (WTA). All the clustering algorithms take as input a matrix of the distances between the structures of the web pages. The distance of two pages is computed applying the Levenshtein edit distance to the strings that encode the sequences of HTML tags of the web pages.KeywordsClone detectionclustering algorithmsreverse engineering

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.

R Discovery Prime

Comparing Clustering Algorithms for the Identification of Similar Pages in Web Applications

Abstract

Published Version (Free)

Talk to us

Similar Papers

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Comparing Clustering Algorithms for the Identification of Similar Pages in Web Applications

Abstract

Published Version (Free)

Talk to us

Similar Papers