Efficient LCA based keyword search in xml data

Yu Xu,Yannis Papakonstantinou

doi:10.1145/1321440.1321597

Abstract

Keyword search in XML documents based on the notion of lowest common ancestors (LCAs) and modifications of it has recently gained research interest [2, 3, 4]. In this paper we propose an efficient algorithm called Indexed Stack to find answers to keyword queries based on XRank's semantics to LCA [2]. The complexity of the Indexed Stack algorithm is O(kd|S1|\log|S|) where k is the number of keywords in the query, d is the depth of the tree and |S1 | (|S|) is the occurrence of the least (most) frequent keyword in the query. In comparison, the best worst case complexity of the core algorithms in [2] is O(kd|S|). We analytically and experimentally evaluate the Indexed Stack algorithm and the two core algorithms in [2]. The results show that the Indexed Stack algorithm outperforms in terms of both CPU and I/O costs other algorithms by orders of magnitude when the query contains at least one low frequency keyword along with high frequency keywords.

Full Text