Abstract

Data management across servers has grown problematic because of technological advancements in data processing and storage capacities. Data that is neither organized nor labelled adds an additional layer of difficulty to the storing and retrieving processes. This data, which is not tagged, requires analytic techniques that are more powerful and time efficient. Clustering has long been regarded as one of the most effective methods for managing large amounts of data; nonetheless, larger volumes can lead to unexpectedly poor accuracy when using conventional clustering methodologies. In this study, we suggest the use of a novel framework for the clustering of large amounts of data. The preprocessing stage is one of the most important parts in the data cleansing process; hence, a global stop-word list is used to filter the contents of the files before sending them on to the cluster distribution stage. A meta-heuristic focused Genetic Algorithm (GA) is utilized to eradicate the redundant information present in the datasets. In addition to the generalized attributable fitness function, an attribute-based innovative fitness function (f) is being developed. To determine how well proposed method performs, it is compared to a variety of alternative clustering approaches. When comparing the distributions of clusters for the purpose of evaluation, the Standard Error (SE), root mean squared error (RMSE), and corrected R squared error are all computed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call