Abstract

The problem in the data storage method that can support the data processing speed in the network is one of the key problems in big data. As computing speed increases and cluster size increases, I/O and network processes related to intensive data usage cannot keep up with the growth rate and data processing speed. Data processing applications will experience latency issues from long I/O. Distributed data storage systems can use Web scale technology to assist centralized data storage in a computing environment to meet the needs of data science. By analyzing several distributed data storage models, namely NFS, GlusterFS and MooseFS, a distributed data storage method is proposed. The parameters used in this study are transfer rate, IOPS and CPU resource usage. Through testing the sequential and random reading and writing of data, it is found that GlusterFS has faster performance and the best performance for sequential and random data reading when using 64k block data storage. MooseFS uses 64k power storage blocks to obtain the best performance in random data read operations. Using 32k data storage blocks, NFS achieves the best results in random writes. The performance of a distributed data storage system may be affected by the size of the data storage block. Using a larger data storage block can achieve faster performance in data transmission and performing operations on data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.