Abstract

The improvement of file access performance is a great challenge in real-time cloud services. In this paper, we analyze preconditions of dealing with this problem considering the aspects of requirements, hardware, software, and network environments in the cloud. Then we describe the design and implementation of a novel distributed layered cache system built on the top of the Hadoop Distributed File System which is named HDFS-based Distributed Cache System (HDCache). The cache system consists of a client library and multiple cache services. The cache services are designed with three access layers an in-memory cache, a snapshot of the local disk, and the actual disk view as provided by HDFS. The files loading from HDFS are cached in the shared memory which can be directly accessed by a client library. Multiple applications integrated with a client library can access a cache service simultaneously. Cache services are organized in the P2P style using a distributed hash table. Every file cached has three replicas in different cache service nodes in order to improve robustness and alleviates the workload. Experimental results show that the novel cache system can store files with a wide range in their sizes and has the access performance in a millisecond level in highly concurrent environments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.