Abstract

The typical approach to data integration is to start by defining a common mediated schema, and then to map the data sources being integrated to this schema. In Internet-scale data integration tasks, where there may be hundreds or thousands of data sources providing data of relevance to a particular domain, a better approach is to allow the user to discover the mediated schema and the set of sources to use through an iterative exploration of the space of possible schemas and sources. In this paper, we present μBE, a data integration tool that helps in this iterative exploratory process by automatically choosing the data sources to include in a data integration system and defining a mediated schema on these sources. The data integration system desired by the user may depend on several subjective and objective criteria, and the user guides μBE towards finding this system by iteratively solving a series of constrained non-linear optimization problems, and modifying the parameters and constraints of the problem in the next iteration based on the solution found in the current iteration. Our formulation of the optimization problem is designed to make it easy for the user to provide such feedback. A simple, intuitive user interface helps the user in this process. We experimentally demonstrate that μBE is efficient and finds high-quality data integration solutions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.