Abstract

The first part of this paper reports a comparative study of the document classifications produced by the use of the single linkage, complete linkage, group average, and Ward clustering methods. Studies of cluster membership and of the effectiveness of cluster searches support previous findings that suggest that the single linkage classifications are rather different from those produced by the other three methods. These latter methods all produce large numbers of small clusters containing just pairs of documents. This finding motivates the work reported in the second part of the paper, which considers the use of clusters consisting of a document together with that document with which it is most similar. A comparison of the use of such clusters with conventional best match searches using seven documents test collections suggest that the two types of search are of comparable effectiveness, but they retrieve noticeably different sets of relevant documents.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call