Abstract

This paper proposes a new and efficient methodology for clustering of html documents. The topic wise categorization of documents into different clusters makes searching easier and efficient. This technique can be utilized by search engines to provide relevant results to the user according to query and also utilized by online journal domains that are maintaining large set of documents. This paper suggests a good word matching and naming of automatic generated clusters , so, the time consume for finding the appropriate cluster for a document will be reduced. This paper shows the use of an efficient technique for finding the similarity between the documents and assigns them a proper cluster. The proper clustering of documents will be further utilized by multidocument summarization system, which produces a summary for the documents related to each other.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call