Abstract

This paper discusses the fault tolerance issues of the Local Area Multiprocessor (LAMP) storage subsystem, and presents its architecture design, error detection and recovery algorithms, and logical volume reconstruction procedure. LAMP is a network of workstations with shared physical memory. Its basic communication protocol is load and store. The LAMP storage subsystem is developed for this class of distributed computing system: 1) It is with distributed shared memory; 2) It uses low-latency and high-bandwidth interconnection; 3) It provides remote DMA support. The LAMP storage subsystem stripes data across multiple nodes for higher I/O performance and availability. It organizes logical volumes (virtual disks) to store files according to the file size, data access pattern, as well as other criteria performance, availability, and security requirements. The LAMP storage subsystem implements RAID technology: RAID-0, 1, and 5 for each logical volume. The write-ahead logging is used to log data, metadata and parity updates of a recovery unit, which allows LAMP storage subsystem to perform fast error recovery. For rapid reconstruction of a failed logical volume, the LAMP logical volume reconstruction algorithm is implemented. In this paper, three main fault tolerance issues of the LAMP storage subsystem are discussed: system configurability for fault tolerance and performance, fast error detection and recovery, and fast logical volume reconstruction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call