Abstract

An information retrieval performance measure that is interpreted as the percent of perfect performance (PPP) can be used to study the effects of the inclusion of specific document features or feature classes or techniques in an information retrieval system. Using this, one can measure the relative quality of a new ranking algorithm, the result of incorporating specific types of metadata or folksonomies from natural language, or determine what happens when one makes modifications to terms, such as stemming or adding part-of-speech tags. For example, knowledge that removing stopwords in a specific system improves the performance 5% of the way from the level of random performance to the best possible result is relatively easy to interpret and to use in decision making; using this percent based measure also allows us to simply compute and interpret that there remains 95% of the possible performance to be obtained using other methods. The PPP measure as used here is based on the average search length, a measure of the ordering quality of a set of data, and may be used when evaluating all the documents or just the first N documents in an ordered list of documents. Because the ASL may be computed empirically or may be estimated analytically, the PPP measure may also be computed empirically or performance may be estimated analytically. Different levels of upper bound performance are discussed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.