Non-volatile memories (NVMs) with their high storage density and ultra-low leakage power offer promising potential for redesigning the memory hierarchy in next-generation Multi-Processor Systems-on-Chip (MPSoCs). However, the adoption of NVMs in cache designs introduces challenges such as NVM write overheads and limited NVM endurance. The shared NVM cache in an MPSoC experiences requests from different processor cores and responses from the off-chip memory when the requested data is not present in the cache. Besides, upon evictions of dirty data from higher-level caches, the shared NVM cache experiences another source of write operations, known as writebacks . These sources of write operations: writebacks and responses, further exacerbate the contention for the shared bandwidth of the NVM cache, and create significant performance bottlenecks. Uncontrolled write operations can also affect the endurance of the NVM cache, posing a threat to cache lifetime and system reliability. Existing strategies often address either performance or cache endurance individually, leaving a gap for a holistic solution. This study introduces the Performance Optimization and Endurance Management (POEM) methodology, a novel approach that aggressively bypasses cache writebacks and responses to alleviate the NVM cache contention. Contrary to the existing bypass policies which do not pay adequate attention to the shared NVM cache contention, and focus too much on cache data reuse, POEM’s aggressive bypass significantly improves the overall system performance, even at the expense of data reuse. POEM also employs effective wear leveling to enhance the NVM cache endurance by careful redistribution of write operations across different cache lines. Across diverse workloads, POEM yields an average speedup of \(34\% \) over a naïve baseline and \(28.8\% \) over a state-of-the-art NVM cache bypass technique, while enhancing the cache endurance by \(15\% \) over the baseline. POEM also explores diverse design choices by exploiting a key policy parameter that assigns varying priorities to the two system-level objectives.