Information Retrieval with Root- and Rule-Based Terms

Jacob Collard,Ira Monarch,Ram Sriram,Eswaran Subrahmanian,Talapady N Bhat,John Elliott

doi:10.2139/ssrn.3565983

Abstract

Root- and rule-based terms are structured representations of natural language phrases that can be automatically generated using a combination of statistical and symbolic methods. These terms are able to represent and normalize syntactic information about natural language phrases, making them richer than basic n-grams while greatly reducing the vocabulary size. In this paper, we discuss the use of root- and rule-based terms for information retrieval. We represent documents and queries as collections of root- and rule-based terms and show that this improves conventional information retrieval methods such as Latent Semantic Indexing and Latent Direchlet Allocation. Root- and rule-based terms improve on state of the art evaluation scores for the TREC 2016 clinical decision support track.

Full Text