Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets

Joven A Tolentino,Bobby D Gerardo

doi:10.18517/ijaseit.9.3.6005

Abstract

The problem of mining a high dimensional data includes a high computational cost, a high dimensional dataset composed of thousands of attribute and or instances. The efficiency of an algorithm, specifically, its speed is oftentimes sacrificed when this kind of dataset is supplied to the algorithm. Fuzzy C-Means algorithm is one which suffers from this problem. This clustering algorithm requires high computational resources as it processes whether low or high dimensional data. Netflix data rating, small round blue cell tumors (SRBCTs) and Colon Cancer (52,308, and 2,000 of attributes and 1500, 83 and 62 of instances respectively) dataset were identified as a high dimensional dataset. As such, the Manhattan distance measure employing the trigonometric function was used to enhance the fuzzy c-means algorithm. Results show an increase on the efficiency of processing large amount of data using the Netflix ,Colon cancer and SRCBT an (39,296, 38,952 and 85,774 milliseconds to complete the different clusters, respectively) average of 54,674 milliseconds while Manhattan distance measure took an average of (36,858, 36,501 and 82,86 milliseconds, respectively) 52,703 milliseconds for the entire dataset to cluster. On the other hand, the enhanced Manhattan distance measure took (33,216, 32,368 and 81,125 milliseconds, respectively) 48,903 seconds on clustering the datasets. Given the said result, the enhanced Manhattan distance measure is 11% more efficient compared to Euclidean distance measure and 7% more efficient than the Manhattan distance measure respectively.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets

Abstract

Talk to us

Similar Papers

More From: International Journal on Advanced Science, Engineering and Information Technology

Lead the way for us

Journal: International Journal on Advanced Science, Engineering and Information Technology	Publication Date: May 26, 2019
Citations: 1

Similar Papers

Approximate relations between Manhattan and Euclidean distance regarding Latin hypercube experimental design
A R M Jalal Uddin Jamali ... Md Asadul Alam
Journal of Physics: Conference Series | VOL. 1366
A R M Jalal Uddin Jamali, et. al.A R M Jalal Uddin Jamali ... Md Asadul Alam
01 Nov 2019
Journal of Physics: Conference Series | VOL. 1366

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space
Mujeeb Ur Rehman ... Dost Muhammad Khan
Information Technology and Control | VOL. 50
Mujeeb Ur Rehman, et. al.Mujeeb Ur Rehman ... Dost Muhammad Khan
25 Mar 2021
Information Technology and Control | VOL. 50

Research on the success of unsupervised learning algorithms in indoor location prediction
Fatma Önay Koçoğlu
International Advanced Researches and Engineering Journal | VOL. 6
Fatma Önay KoçoğluFatma Önay Koçoğlu
15 Aug 2022
International Advanced Researches and Engineering Journal | VOL. 6

Building interpretable fuzzy models for high dimensional data analysis in cancer diagnosis.
Zhenyu Wang ... Vasile Palade
BMC Genomics | VOL. Suppl 12 2
Zhenyu Wang, et. al.Zhenyu Wang ... Vasile Palade
01 Jan 2010
BMC Genomics | VOL. Suppl 12 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets

Abstract

Talk to us

Similar Papers

More From: International Journal on Advanced Science, Engineering and Information Technology