Abstract

There are lots of data communications among subtasks in big data processing, especially those key pair intermediate data. Its a key point concerning big data analytic area to process those intermediate data efficiently. However state of the art big data systems rare put an eye on the optimization and effeicient management of those intermediate data. On one hand, intermediate data in MapReduce and other batch systems are stored sequentially in files. Those intermediate date cannot be merged automatically, extra shuffle phase is needed in MapReduce to aggregate values with the same key, which surely costs extra time. On the other hand, some distributed message management system such as stream data processing system can aggregate the intermediate messages effectively, but they are at a much higher-level abstraction. Those systems usually suffer from complicated structure, which may incur the performance degradation. In this paper, we propose a B+ tree based data structure, KVBTree, to manage the intermediate data effectively. An efficient cache strategy is then proposed to increase the storage capability of the KVBTree. We also present the strategies for the querying, inserting and traversing of the data elements in KVBTree. We evaluate the time complexity and the space complexity of KVBTree in the final part of this paper. The experimental results show that KVBTree is able to play an effective role concerning the electric power data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.