Abstract

In today's era, with the availability of a huge amount of dynamic information available in world wide web (WWW), it is complex for the user to retrieve or search the relevant information. One of the techniques used in information retrieval is clustering, and then the ranking of the web documents is done to provide user the information as per their query. In this paper, semantic similarity score of Semantic Web documents is computed by using the semantic-based similarity feature combining the latent semantic analysis (LSA) and latent relational analysis (LRA). The LSA and LRA help to determine the relevant concepts and relationships between the concepts which further correspond to the words and relationships between these words. The extracted interrelated concepts are represented by the graph further representing the semantic content of the web document. From this graph representation for each document, the HCS algorithm of clustering is used to extract the most connected subgraph for constructing the different number of clusters which is according to the information-theoretic approach. The web documents present in clusters in graphical form are ranked by using the text-rank method in combination with the proposed method. The experimental analysis is done by using the benchmark datasets OpinRank. The performance of the approach on ranking of web documents using semantic-based clustering has shown promising results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call