Abstract
We present a system architecture that uses high-efficiency processors as opposed to high-performance processors, NAND flash as byte-addressable main memory, and high-speed DRAM as a cache front-end for the flash. The main memory system is interconnected and presents a unified global address space to the client microprocessors. A single cabinet contains 2,550 nodes, networked in a highly redundant modified Moore graph that yields a bisection bandwidth of 9.1 TB/s and a worst-case latency of four hops from any node to any other. At a per-cabinet level, the system supports a minimum of 2.6 petabytes of main memory, dissipates 90 kW, and achieves 2.2 PetaFLOPS. The system architecture provides several features desirable in today’s large-scale systems, including a global shared physical address space (and optional support for a global shared virtual space as well), the ability to partition the physical space unequally among clients as in a unified cache architecture (e.g., so as to support multiple VMs in a datacenter), pairwise system-wide sequential consistency on user-specified address sets, built-in checkpointing via journaled non-volatile main memory, memory cost-per-bit approaching that of NAND flash, and memory performance approaching that of pure DRAM.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.