Abstract

Modern scientific collaborations require large-scale integration of various processes. Higher-level dataflow languages are used on top of parallel and distributed dataflow systems to enable faster data-intensive workflow programs development, their easier optimization, and more maintainable code. In this paper, we present the rationales, design, and application of the needed advanced support for modeling and optimizing data flows for data mining and integration processes. The optimization research and development is based on dataflow pre-execution modeling and extending the registry of process activities by advanced annotations. Additionally, the overall process from a dynamic model to a static model as input for the optimization algorithms is described. This novel approach is implemented within an advanced graphical user interface, called the Process Designer, in order to support semi-automatic optimization as well as within a dataflow execution platform, called the Gateway. It can be adapted to any dataflow language implementation. The Process Designer architecture based on modern (meta-)modeling concepts naturally supports validated transformations between external textual and internal graphical representations of the targeted dataflow language, and in this way significantly increases the productivity and robustness of the implementation processes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call