For annotation in cancer genomic medicine, oncologists have to refer to various knowledge bases worldwide and retrieve all information (e.g., drugs, clinical trials, and academic papers) related to a gene variant. However, oncologists find it difficult to search these knowledge bases comprehensively because there are multiple paraphrases containing abbreviations and foreign languages in their terminologies including diseases, drugs, and genes. In this paper, we propose a novel search method considering deep paraphrases, which helps oncologists retrieve essential annotation resources swiftly and effortlessly. Our method recursively finds paraphrases based on paraphrase corpora, expands a source document, and finally generates a paraphrase lattice. The proposed method also feedbacks beneficial information regarding the paraphrases applied for a search, which is useful for selecting search results and considering a query for the succeeding search. The results of an experiment demonstrated that our method could retrieve important annotation information that could not be retrieved using a conventional search system and simple paraphrasing. Additionally, annotation experts evaluated our method and found it to be practical.
Read full abstract