Using Fuzzy Clustering Powered by Weighted Feature Matrix to Establish Hidden Semantics in Web Documents

Pramod D Patil,Parag Kulkarni

doi:10.14569/ijacsa.2018.090864

Abstract

Digital Data is growing exponentially exploding on the 'World Wide Web'. The orthodox clustering algorithms obligate various challenges to tackle, of which the most often faced challenge is the uncertainty. Web documents have become heterogeneous and very complex. There exist multiple relations between one web document and others in the form of entrenched links. This can be imagined as a one to many (1-M) relationships, for example, a particular web document may fit in many cross domains viz. politics, sports, utilities, technology, music, weather forecasting, linked to ecommerce products, etc. Therefore, there is a necessity for efficient, effective and constructive context driven clustering methods. Orthodox or the already well-established clustering algorithms adhere to classify the given data sets as exclusive clusters. Signifies that we can clearly state whether to which cluster an object belongs to. But such a partition is not sufficient for representing in the real time. So, a fuzzy clustering method is presented to build clusters with indeterminate limits and allows that one object belongs to overlying clusters with some membership degree. In supplementary words, the crux of fuzzy clustering is to contemplate the fitting status to the clusters, as well as to cogitate to what degree the object belongs to the cluster. The aim of this study is to device a fuzzy clustering algorithm which along with the help of feature weighted matrix, increases the probability of multi-domain overlapping of web documents. Over-lapping in the sense that one document may fall into multiple domains. The use of features gives an option or a filter on the basis of which the data would be extracted through the document. Matrix allows us to compute a threshold value which in turn helps to calculate the clustering result.

Highlights

Let us try to understand the need or motivation of the system
We studied the Fuzzy Logic implementation done in this paper
We found that K-Means, Hierarchical algorithms do not perform well due to the inability to recognize the semantic meaning of the document

Summary

Introduction

With an incredible circulation of several hundred million sites worldwide, the ever changing cluster of documents over the internet is getting bigger and bigger every day This incorporates some very important and as well very difficult challenges. The other most important aspect was the traditional clustering algorithms use the standard numpy arrays which are very slow and not so effective in time complexity wise processing These traditional clustering algorithms face the issue of „Concentration Measure‟ or „Curse Dimensionality‟. This was the motivation to propose a new algorithm using Weighted Matrix applying the Fuzzy Logic method. This would suffice the end user queries correctly. Let us have a detail overview of components of our system and let us understand what operations it is designated to do

Objectives

Results

Conclusion