Element Retrieval Using Namespace Based on Keyword Search over XML Documents

Yang Wang,Xiaodi Huang,Zhikui Chen

doi:10.4236/jsea.2010.31008

Yang Wang, Xiaodi Huang + Show 1 more

Open Access

https://doi.org/10.4236/jsea.2010.31008

Copy DOI

Abstract

Querying over XML elements using keyword search is steadily gaining popularity. The traditional similarity measure is widely employed in order to effectively retrieve various XML documents. A number of authors have already proposed different similarity-measure methods that take advantage of the structure and content of XML documents. However, they do not consider the similarity between latent semantic information of element texts and that of keywords in a query. Although many algorithms on XML element search are available, some of them have the high computational complexity due to searching for a huge number of elements. In this paper, we propose a new algorithm that makes use of the se-mantic similarity between elements instead of between entire XML documents, considering not only the structure and content of an XML document, but also semantic information of namespaces in elements. We compare our algorithm with the three other algorithms by testing on real datasets. The experiments have demonstrated that our proposed method is able to improve the query accuracy, as well as to reduce the running time.

Highlights

Keyword search querying over XML elements has emerged as one of the most effective paradigms in information retrieval
We empirically provide an XML document named as record.xml in Figure 3 which consists of many elements with namespace 'c' describing semantic "computer" and 'n' describing "joy"
This paper addresses the keyword search over elements in XML documents

Summary

Introduction

Keyword search querying over XML elements has emerged as one of the most effective paradigms in information retrieval. Some authors calculated the similarity between the content of XML documents and query, only analyzing the content and structure of XML (e.g., [1,2,3]). Many algorithms calculate the degree of text of elements matching with the keywords to produce the ranked result-list (e.g., DIL Query processing algorithm [4] and Top-k algorithm [5]). The classical methods focus on TF-IEF formula to calculate the cosine similarity between elements and query (e.g., Tae-Soon Kim et al [6]; Maria Izabel M et al [7]; Yun-tao Zhang et al [8]). We modify it to deal with the elements overlap occurring in keyword search results

Objectives

Methods

Conclusion