Abstract

AbstractThe performance of data processing in distributed information systems strongly depends on the efficient scheduling of the applications that access data at the remote sites. This work assumes a typical model of distributed information system where a central site is connected to a number of remote and highly autonomous remote sites. An application started by a user at a central site is decomposed into several data processing tasks to be independently processed at the remote sites. The objective of this work is to find a method for optimization of task processing schedules at a central site. We define an abstract model of data and a system of operations that implements the data processing tasks. Our abstract data model is general enough to represent many specific data models. We show how an entirely parallel schedule can be transformed into a more optimal hybrid schedule where certain tasks are processed simultaneously while the other tasks are processed sequentially. The transformations proposed in this work are guided by the cost-based optimization model whose objective is to reduce the total data transmission time between the remote sites and a central site. We show how the properties of data integration expressions can be used to find more efficient schedules of data processing tasks in distributed information systems.KeywordsDistributed information systemdata processingschedulingdata integrationoptimization

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call