High Dimensional Cluster Analysis Using Path Lengths

Kevin Mcilhany,Stephen Wiggins

doi:10.4236/jdaip.2018.63007

Abstract

A hierarchical scheme for clustering data is presented which applies to spaces with a high number of dimensions (). The data set is first reduced to a smaller set of partitions (multi-dimensional bins). Multiple clustering techniques are used, including spectral clustering; however, new techniques are also introduced based on the path length between partitions that are connected to one another. A Line-of-Sight algorithm is also developed for clustering. A test bank of 12 data sets with varying properties is used to expose the strengths and weaknesses of each technique. Finally, a robust clustering technique is discussed based on reaching a consensus among the multiple approaches, overcoming the weaknesses found individually.

Highlights

Clustering is a fundamental technique and methodology in data analysis and machine learning
All clustering techniques plus the four robust consensus results of each test case would be presented, leading to 360 figures, but due to space limitations, the full set of clustering results are provided in the supplemental material
Due to the number of variations in spectral clustering, the techniques are identified by an index given in Table 3, while in the text, a shorthand will be used: ([1] [2], NN1, 2DHIST) to represent the use of the 1st and 2nd eigenvectors, utilizing a Laplacian based on an adjacency matrix derived from the first nearest neighbor matrix NN1 and gathering the partitions into clusters within the eigenspace using a 2D histogram (6 × 6) bins

Summary

Introduction

Clustering is a fundamental technique and methodology in data analysis and machine learning. The explosion of the field of data science has, led to an expansion in how this notion is applied. In this respect, it would be more appropriate to refer to clustering as data organization, which would encompass the ideas of 1) data reduction, 2) data identification, 3) data clustering, and 4) data grouping. Data reduction is the process of converting raw data into a form that is more amenable for the application of a specific analytical and/or computational methodology. Data clustering is the process of associating data through proximity, similarity, or dissimilarity.

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Data Analysis and Information Processing	Publication Date: Jan 1, 2018
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

High Dimensional Cluster Analysis Using Path Lengths

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Analysis and Information Processing

Lead the way for us

Similar Papers

Performance Comparison of Various Robust Data Clustering Algorithms
Shashank Sharma ... Megha Goel
International Journal of Intelligent Systems and Applications | VOL. 5
Shashank Sharma, et. al.Shashank Sharma ... Megha Goel
01 Jun 2013
International Journal of Intelligent Systems and Applications | VOL. 5

Robust fuzzy clustering algorithms in analyzing high-dimensional cancer databases
S.R Kannan ... A Ravikumar
Applied Soft Computing | VOL. 35
S.R Kannan, et. al.S.R Kannan ... A Ravikumar
25 Jun 2015
Applied Soft Computing | VOL. 35

An efficient and robust combined clustering technique for mining in large spatial databases
R S Elhadary ... O H Karam
-
R S Elhadary, et. al.R S Elhadary ... O H Karam
01 Nov 2007
01 Nov 2007

A robust clustering method for detection of abnormal situations in a process with multiple steady-state operation modes
Mauricio Maestri ... Gabriel Horowitz
Computers and Chemical Engineering | VOL. 34
Mauricio Maestri, et. al.Mauricio Maestri ... Gabriel Horowitz
27 May 2009
Computers and Chemical Engineering | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High Dimensional Cluster Analysis Using Path Lengths

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Analysis and Information Processing