Abstract

Although biomedical information available in articles and patents is increasing exponentially, we continue to rely on the same information retrieval methods and use very few keywords to search millions of documents. We are developing a fundamentally different approach for finding much more precise and complete information with a single query using predicates instead of keywords for both query and document representation. Predicates are triples that are more complex datastructures than keywords and contain more structured information. To make optimal use of them, we developed a new predicate-based vector space model and query-document similarity function with adjusted tf-idf and boost function. Using a test bed of 107,367 PubMed abstracts, we evaluated the first essential function: retrieving information. Cancer researchers provided 20 realistic queries, for which the top 15 abstracts were retrieved using a predicate-based (new) and keyword-based (baseline) approach. Each abstract was evaluated, double-blind, by cancer researchers on a 0–5 point scale to calculate precision (0 versus higher) and relevance (0–5 score). Precision was significantly higher (p<.001) for the predicate-based (80%) than for the keyword-based (71%) approach. Relevance was almost doubled with the predicate-based approach—2.1 versus 1.6 without rank order adjustment (p<.001) and 1.34 versus 0.98 with rank order adjustment (p<.001) for predicate—versus keyword-based approach respectively. Predicates can support more precise searching than keywords, laying the foundation for rich and sophisticated information search.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.