Abstract

AbstractThe problem of identifying the data contributed to a query answer is referred to as lineage tracing. While this has been studied extensively in data warehouse systems, it is identified as a research topic in the mediator-based approach to information integration [10]. A main problem in this context is that a mediator does not store data, and hence for query processing and tracing, it has to communicate with the data sources. While this communication could be expensive, the real issue is that in some situations, after a query is being processed, lineage tracing may be more difficult, e.g., when the schema of a source has changed, or may even be impossible, e.g., when a source becomes unavailable. In this paper, we study the lineage tracing problem in mediator-based systems and propose a solution by collecting “enough” data and metadata during query processing so that tracing is possible in such situations.. We have developed a system prototype, called ELIT (for Explorationand LIneage Tracing). To allow more flexibility, ELIT supports lineage tracing in two modes: batch and interactive. Due to the distributed nature of the context, efficiency is of primary concern for practical reasons. We therefore investigate ways to reduce the overhead of lineage tracing in the proposed framework while processing queries. Using some basic query optimization techniques in ELIT, our preliminary experimental results show considerable increase in efficiency. This indicates the proposed ideas in the framework of ELIT could lend themselves to powerful lineage tracing and data analysis tools, by incorporating more sophisticated query optimization techniques.KeywordsQuery ProcessingTransformation FunctionUser QueryQuery EvaluationAtomic DataThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call