Synthetic Test Data Generation for Hierarchical Graph Clustering Methods

László Szilágyi,Levente Kovács,Sándor Miklós Szilágyi

doi:10.1007/978-3-319-12640-1_37

Synthetic Test Data Generation for Hierarchical Graph Clustering Methods

László Szilágyi, Levente Kovács + Show 1 more

Open Access

https://doi.org/10.1007/978-3-319-12640-1_37

Copy DOI

Publication Date: Jan 1, 2014

Citations: 2

Affiliation: Budapest University of Technology and Economics, Óbuda University, Eötvös Loránd University, Universitatea Petru Maior din Tîrgu Mureş

#Large-scale Data #Synthetic Data + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Recent achievements in graph-based clustering algorithms revealed the need for large-scale test data sets. This paper introduces a procedure that can provide synthetic but realistic test data to the hierarchical Markov clustering algorithm. Being created according to the structure and properties of the SCOP95 protein sequence data set, the synthetic data act as a collection of proteins organized in a four-level hierarchy and a similarity matrix containing pairwise similarity values of the proteins. An ultimate high-speed TRIBE-MCL algorithm was employed to validate the synthetic data. Generated data sets have a healthy amount of variability due to the randomness in the processing, and are suitable for testing graph-based clustering algorithms on large-scale data.

Full Text