RDF (Resource Description Framework) is a model widely used to construct knowledge bases, while SPARQL (SPARQL Protocol and RDF Query Language) is the standardized structured query language to manipulate RDF data. Recently, many data providers have published their RDF datasets in their own autonomous sites and provided SPARQL query interfaces, called <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">RDF sources</i> . In order to integrate multiple RDF sources, researchers put forward the federated RDF system to support the federated SPARQL queries. However, existing studies can only support efficient basic queries but not top-k queries. Toward this end, we propose a cost-driven top-k queries optimization approach in federated RDF systems, which can support both top-k queries for single variable ordering and expression ordering. Firstly, we propose an optimized query decomposition method to decompose the federated query into multiple subqueries. Secondly, while considering the top-k operator, we propose a cost model to evaluate the query cost and join cost of subqueries. The optimal query plan can be obtained by the costed-based query plan generation algorithm. Finally, combined with the characteristics of top-k queries, an incremental query plan execution strategy is developed to minimize the total query cost. Experimental results show that the proposed method is effective, efficient and scalable.
Read full abstract