Abstract

In this paper, we propose GAD (General Activity Detection) for fast clustering on large scale data. Within this framework we design a set of algorithms for different scenarios: (1) Exact GAD algorithm E-GAD, which is much faster than K-Means and gets the same clustering result. (2) Approximate GAD algorithms with different assumptions, which are faster than E-GAD while achieving different degrees of approximation. (3) GAD based algorithms to handle the “large clusters” problem which appears in many large scale clustering applications. Two existing activity detection algorithms GT and CGAUTC are special cases under the framework. The most important contribution of our work is that the framework is the general solution to exploit activity detection for fast clustering in both exact and approximate senarios, and our proposed algorithms within the framework can achieve very high speed. Extensive experiments have been conducted on several large datasets from various real world applications; the results show that our proposed algorithms are effective and efficient.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call