Abstract

Purpose – This paper aims to improve the semantic-disambiguation capability of an information-retrieval system by taking advantages of a well-crafted classification tree. The unstructured nature and sheer volume of information accessible over networks have made it drastically difficult for users to seek relevant information. Many information-retrieval methods have been developed to address this problem, and keyword-based approach is amongst the most common approach. Such an approach is often inadequate to cope with the conceptualization associated with user needs and contents. This brings about the problem of semantic ambiguation that refers to the disagreement in meaning of terms between involving parties of a communication due to polysemy, leading to increased complexity and lesser accuracy in information integration, migration, retrieval and other related activities. Design/methodology/approach – A novel ontology-based search approach, named GeTFIRST (short for Graph-embedded Tree Fostering Information Retrieval SysTem), is proposed to disambiguate keywords semantically. The contribution is twofold. First, a search strategy is proposed to prune irrelevant concepts for accuracy improvement using our Graph-embedded Tree (GeT)-based ontology. Second, a path-based ranking algorithm is proposed to incorporate and reward the content specificity. Findings – An empirical evaluation was performed on United States Patent And Trademark Office (USPTO) patent datasets to compare our approach with full-text patent search approaches. The results showed that GeTFIRST handled the ambiguous keywords with higher keyword-disambiguation accuracy than traditional search approaches. Originality/value – The search approach of this paper copes with the semantic ambiguation by using our proposed GeT-based ontology and a path-based ranking algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call