Abstract

As an important issue of machine learning, clustering receives much care in recent years. Among all clustering approaches, most of them conduct clustering operations on overall data. That is, they learn label information from all data. That comes across critical challenge in times of high-sized datasets. This paper proposes a novel Three-phase Labeling algorithm (TPL) based on SVC to overcome this problem. TPL consists of selecting data representatives (Data representatives), clustering (Data representatives) and then classifying non-Data representatives respectively. Support vector clustering process is modified to select qualified Data representatives in first phase. Spectrum technique governs the second-phase clustering task. Therein, the geometric properties of feature space, a new metric, and a tuning strategy of Kernel scale are used. In experiments on real datasets, TPL achieves clear improvement in accuracy and efficiency over its counterparts, and demonstrates highly competitive clustering performance in comparison with some state of the arts.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call