Abstract

The progressive growth in the volume of digital data has become a technological challenge of great interest in the field of computer science. That comes because, with the spread of personal computers and networks worldwide, content generation is taking larger proportions and very different formats from what had been usual until then. To analyze and extract relevant knowledge from these masses of complex and large volume data is particularly interesting, but before that, it is necessary to develop techniques to encourage their resilient storage. Very often, storage systems use a replication scheme for preserving the integrity of stored data. This involves generating copies of all information that, if lost by individual hardware failures inherent in any massive storage infrastructure, do not compromise access to what was stored. However, it was realized that accommodate such copies requires a real storage space often much greater than the information would originally occupy. Because of that, there is error correction codes, or erasure codes, which has been used with a mathematical approach considerably more refined than the simple replication, generating a smaller storage overhead than their predecessors techniques. The contribution of this work is a fully decentralized storage strategy that, on average, presents performance improvements of over 80% in access latency for both replicated and encoded data, while minimizing by 55% the overhead for a terabyte-sized dataset when encoded and compared to related works of the literature.

Highlights

  • Large-scale data, or Big Data, resilient storage is one of the major problems addressed in terms of infrastructure support in computer science (Alnafoosi and Steinbach, 2013) (Hashem et al, 2015)

  • Later in related works we find Robot storage architecture (Yin et al, 2013), which relies on the sole usage of erasure codes for data storage and ignores replication as combined approach, which according to recent studies may be an error (Gribaudo et al, 2016)

  • The measurements incurred from the variation of the size of objects stored with three-way replication or with erasure coding

Read more

Summary

Introduction

Large-scale data, or Big Data, resilient storage is one of the major problems addressed in terms of infrastructure support in computer science (Alnafoosi and Steinbach, 2013) (Hashem et al, 2015) This means that when it comes to valuable information, storage systems design needs planning in such a way that no data is ever lost, regardless of external faults or factors common to any computational environment, such as hard disk failures and server crashes. In this sense, many of the existing state-of-the-art technologies use a replication methodology, which consists of entirely copying and storing data at different locations, often geographically distant, adding a degree of redundancy (Gonizzi et al, 2015).

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.