Abstract
A software system has been developed which orders citations retrieved from an online database in terms of relevancy. The system resulted from an effort generated by NASA's Technology Utilization Program to create new advanced software tools to largely automate the process of determining relevancy of database citations retrieved to support large technology transfer studies. The ranking is based on the generation of an enriched vocabulary using lexical association methods, a user assessment of the vocabulary and a combination of the user assessment and the lexical metric. One of the key elements in relevancy ranking is the enriched vocabulary — the terms must be both unique and descriptive. This paper examines term uniqueness. Six lexical association methods were employed to generate characteristic word indices. A limited subset of the terms the highest 20, 40, 60 and 7.5% of the unique words —were compared and uniqueness factors developed. Computational times were also measured. It was found that methods based on occurrences and signal produced virtually the same terms. The limited subset of terms produced by the exact and centroid discrimination value were also nearly identical. Unique term sets were produced by the occurrence, variance and discrimination value (centroid). An end-user evaluation showed that the generated terms were largely distinct and had values of word precision which were consistent with values of the search precision.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.