Abstract

We propose a new clustering algorithm called SMIP. It uses a statistics method to obtain the clustering parameters automatically. Mathematics morphology theory is introduced into clustering to acquire high speed and accuracy. Based on it, we realize incremental clustering and distribution parallel clustering. Our incremental clustering can yield significant speed-up factors for new coming data in an already processed database. Our distribution parallel clustering can be run on a number of workstations connected via network. It is robust and efficient with low overhead. We realized SMIP by JAVA language. The tests show that SMIP is very efficient with a complexity of O(N), N being the number of points in databases; it is much faster than DBSCAN; it is effective in discovering clusters of arbitrary shape; it is not sensitive to noise; It has some ability to deal with high dimensional points; incremental clustering can speed up the process over 30 times than complete re-clustering; the total overhead of parallel clustering on four workstations is below 13%. SMIP is an ideal clustering method for very large databases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call