Term-relevance computations and perfect retrieval performance

W.M Shaw

doi:10.1016/0306-4573(95)00011-5

Abstract

Computing formulas for binary independent (BI) term relevance weights are evaluated as a function of query representations and retrieval expectations in the CF database. Query representations consist of the limited set of terms appearing in each query statement and the complete set of terms appearing in the database. Retrieval expectations include comprehensive searches, for which many relevant documents are sought, and specific searches, for which only a few documents have merit. Conventional computing equations, which are known to over estimate term relevance weights, are shown to produce mediocre results for all combinations of query representations and retrieval expectations. Modified computing equations, which do not over estimate relevance weights, produce essentially perfect retrieval results for both comprehensive and specific searches, when the query representation is complete. Probabilistic retrieval, based on BI assumptions and applied to simple subject descriptions of documents and queries, can retrieve all relevant documents and only relevant documents, when term relevance weights are computed accurately.

Full Text