Abstract
Root- and rule-based terms are structured representations of natural language phrases that can be automatically generated using a combination of statistical and symbolic methods. These terms are able to represent and normalize syntactic information about natural language phrases, making them richer than basic n-grams while greatly reducing the vocabulary size. In this paper, we discuss the use of root- and rule-based terms for information retrieval. We represent documents and queries as collections of root- and rule-based terms and show that this improves conventional information retrieval methods such as Latent Semantic Indexing and Latent Direchlet Allocation. Root- and rule-based terms improve on state of the art evaluation scores for the TREC 2016 clinical decision support track.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have