Abstract

Because of the negative perception towards noise, it is commonly eliminated in the process of data cleansing prior to the analysis process. Some studies attempt to employ tolerant or robust algorithms to achieve a reliable outcome. One way or another, the impact of noise might be minimized, thus preserving the integrity of discovered knowledge. On the other hand, making good use of noise has recently been investigated and exploited in different contexts, such as in privacy-preserving data mining, single clustering and consensus clustering. Given our initial study of employing uniform random noise in the process of ensemble generation as a way to increase diversity within an ensemble, improved clustering goodness can be obtained at specific levels of noise. To consolidate the aforementioned finding, this paper investigates a rich collection of random noise functions, which can be used to form perturbed data variation within the framework of noise-induced ensemble generation. The effectiveness of this approach which uses different cases for random noise is demonstrated over benchmark datasets from the UCI repository. The results suggest that the noise-induced strategy is generally better than the baseline counterpart, whilst showing uneven improvement with different data patterns. As such, a guideline is provided to make the best use of the proposed method with any new set of data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call