Abstract

Contemporary storage systems increasingly offer schema flexibility and support for semi-structured data models. This is the case for document-oriented databases, which as such allow ingestion of data from heterogeneous sources (IoT, sensors, monitoring). The increased influx of data further emphasizes the necessity for horizontal and elastic scalability, which are attained in NoSQL document stores through simplifying query functionality and relaxing transactional properties, e.g. through eventual consistency. The most compelling benefits of document stores are attained when data is stored in a denormalized form (De-NF). For example, one can decide to store relationships as an embedded copy to increase read query performance and as such avoid costly cross-node consultations. In comparison to the normalized form (NF), such designs come at a cost of additional data duplication, consistency and decreased write- and update performance. Determining the most appropriate data model for an application however depends on many factors, and the application developer is faced with the complexity of designing document data models that are optimized in terms of performance, scalability, storage and memory size, all requiring in-depth knowledge on the technology, the data meta-model, query plans and expected workloads. In this paper, we first discuss factors that impact the data schema design in document stores, such as the nature of the document and its attributes, horizontal partitioning, index selection, workload variability, and data uniformity. Although some data model design support tools are in existence, there are none that systematically take into account all these factors. Then, we outline our vision and roadmap towards systematic schema design support and tooling that involves (i) leveraging heuristics and common tactics to generate a finite number of candidate data models and (ii) ranking these candidate data models by means of cost functions that express their cost-effectiveness.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call