Block-level cooperation is an endurance management technique that operates on top of error correction mechanisms to extend memory lifetimes. Once an error recovery scheme fails to recover from faults in a data block, the entire physical page associated with that block is disabled and becomes unavailable to the physical address space. To reduce the page waste caused by early block failures, other blocks can be used to support the failed block, working cooperatively to keep it alive and extend the faulty page’s lifetime. We combine the proposed technique with existing error recovery schemes, such as Error Correction Pointers (ECP) and Aegis, to increase memory lifetimes. Block cooperation is realized through metadata sharing in ECP, where one data block shares its unused metadata with another data block. When combined with Aegis, block cooperation is realized through reorganizing data layout, where blocks possessing few faults come to the aid of failed blocks, bringing them back from the dead. Our evaluation using Monte Carlo simulation shows that block cooperation at a single level (or multiple levels) on top of ECP and Aegis, boosts memory lifetimes by 28% (37%) and 8% (14%) on average, respectively. Furthermore, using trace-driven benchmark evaluation shows that lifetime boost can reach to 68% (30%) exploiting metadata sharing (or data layout reorganization).
Read full abstract