Abstract

Many important applications such as recommender systems, e-commerce sites, web crawlers involve dynamic datasets. Dynamic datasets undergo frequent changes in the form of insertion or deletion of data that affects its size. A naive algorithm may not process these frequent changes efficiently as it involves the entire set of data points each time a change is inflicted. Fast incremental algorithms process these updates to datasets efficiently to avoid redundant computation. In this article, we propose incremental extensions to shared nearest neighbor density-based clustering (SNNDB) algorithm for both addition and deletion of data points. Existing incremental extension to SNNDB viz. InSDB cannot handle deletion and handles insertions one point at a time. Our method overcomes both these bottlenecks by efficiently identifying affected parts of clusters while processing updates to dataset in batch mode. We propose three incremental variants of SNNDB in batch mode for both addition and deletion with the third variant being the most effective. Experimental observations on real world and synthetic datasets showed that our algorithms are up to 4 orders of magnitude faster than the naive SNNDB algorithm and about 2 orders of magnitude faster than the pointwise incremental method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call