Abstract

Data generated from modern scientific instrumentation have grown up to an unprecedented scale. Moreover, data formats and computational behaviors of scientific big data workloads are much more complex than those in Internet services. These two facts pose a serious challenge to scientific data management and analytics. Among many concerns, the first one is how to build a comprehensive and representative scientific big data benchmark suite. Previous benchmark efforts either focus on Internet areas (i.e. BigDataBench) or pay attention to a specific area (i.e. GeneBase). This paper presents our preliminary work on building a comprehensive scientific big data benchmark suite---BigDataBench-S. Also, we use BigDataBench-S to evaluate several general-purpose big data management systems specifically designed for Internet services applications. Our evaluation shows: these systems cannot achieve expected performance for many scientific workloads, especially for complex matrix computation, for the lack of appropriate mechanisms and policies on data storage, query optimization and support of distributed matrix computation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call