Abstract

The benefits of a well constructed thesaurus to an information retrieval system have long been recognized by both researchers and practitioners in the field. Previous experiments have investigated the construction of thesauri by manual, semiautomatic, and automatic means. Automatic thesaurus generation in particular has proven to be an especially difficult problem. This paper examines both early and current approaches to automatic thesaurus construction and describes an approach to the automatic generation of global thesauri based on the term discrimination value model of Salton, Yang, and Yu and on an appropriate clustering algorithm. This method has been implemented and applied to two document collections. Preliminary results indicate that this method, which produces improvements in retrieval performance in excess of 10 and 15 percent in the test collections, is viable and worthy of continued investigation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call