Abstract

Keyword Search has been recognised as a viable alternative for information search in semi-structured and structured data sources. Current state-of-the-art keyword-search techniques over relational databases do not take advantage of correlative meta-information included in structured and semi-structured data sources leaving relevant answers out. These techniques are also limited due to scalability, performance and precision issues that are evident when they are implemented on large datasets. Based on an in-depth analysis of issues related to indexing and ranking semi-structured and structured information. We propose a new keyword-search algorithm that takes into account the semantic information extracted from the schemes of the structured and semi-structured data sources and combine it with the textual relevance obtained by a common text retrieval approach. The algorithm is implemented in a keyword-based search engine called KESOSASD (Keyword Search Over Semi-structured and Structured Data), improving its precision and response time. Our approach models the semi-structured and structured information as graphs, and make use of a Virtual Document Structure Aware Inverted Index (VDSAII). This index is created from a set of logical structures called Virtual Documents, which capture and exploit the implicit structural relationships (semantics) depicted in the schemas of the structured and semi-structured data sources. Extensive experiments were conducted to demonstrate that KESOSASD outperforms existing approaches in terms of search efficiency and accuracy. Moreover, KESOSASD is prepared to scale out and manage large databases without degrading its effectiveness.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.