Abstract

Modern IT world requires data integration systems to deal with the large number of heterogeneous data sources. Such systems should perform not only data extraction, but also schema alignment, entity resolution and data fusion. In the world of big data with large number of heterogenous data sources, there are number of methods that address various aspects of integration, to make the system automatic and less user-dependent. This work proposes an extensible approach for development of data integration system to perform materialized integration of heterogenous sources in a distributed computation environment. A prototype of the system with implementation of advanced methods for big data integration has been developed. The system is applied in e-commerce domain.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call