Abstract
Word stems derived from titles and abstracts are used to represent the 1,239 documents in the cystic fibrosis document collection. Evidence for clustering structure and the effectiveness of cluster-based retrieval are investigated as a function of the exhaustivity of the uncontrolled subject descriptions. Results are compared to equivalent calculations for controlled descriptions based on Medical Subject Headings (MeSH) and subheadings. For both representations, the evidence for clustering structure is inversely related to the effectiveness of cluster-based retrieval. Exhaustive subject descriptions produce the strongest evidence for clustering structure and the lowest levels of retrieval performance. Levels of retrieval performance associated with exhaustive subject descriptions can be explained by assuming that the structure imposed on documents by subject relationships is the result of a random process. Optimal levels of cluster-based retrieval performance can be detected for both representations. The optimal levels of performance provide a clear indication of the relative utility of document representations, and show that controlled and uncontrolled subject descriptions produce equivalent levels of performance and complementary outcomes. High levels of retrieval performance are achieved by optimizing the exhaustivity of document representations for each query. Retrieval performance based on combinations of retrieval outcomes from the subject descriptions is materially superior to the highest performance of each representation. Average levels of recall, precision, and effectiveness are shown to convey little information about typical outcomes. Performance standards for individual queries are suggested.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.