Abstract

Most of the domains of science are facing an explosion in the data that they have to collect and process in order to conduct research. This is true for both: scientific domains dealing with experimental data, e.g. biology, sociology, astronomy etc, but also scientific domains dealing with simulation data, e.g. seismology, physics etc. To maximize the potential outcome of scientific data analysis, respective data management applications need to fulfil the following coarse tasks: fast on-demand data processing, and effective storage and consolidation of diverse data collections. These two tasks are in general hard to realize because of: (a) the big data size, (b) the diversity of data formats, (c) their conceptual dependencies, (d) disperse data locations, and (e) intensive and systematic nature of scientific queries. We present the characteristics of big scientific data collections and their necessities in terms of data management. Based on this discussion, we discuss the structure of a framework for the processing and consolidation of heterogeneous scientific data collections. Such a framework aims to mediate between the user and a set of available data management technologies, such as relational DBMSs, key-value stores and column stores, in order to efficiently direct data management operations (insertions, updates) and especially requests (queries) to the appropriate data management application. The framework aims to distribute, dissect, and schedule data management actions, as well as integrate results, in a way that reduces response time. This entails the accommodation of methods for the selective parallelism and serialization depending on partial results and response times. Also, this entails the accommodation of methods for the gradual alteration of data formats and storage, e.g. storage of semi-structured data or raw data in files into relational databases. Furthermore, we discuss the processing of scientific query bulks or workflows with the possibility to retrieve early partial results and calibrate query parameters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call