Spin-transfer torque random access memory (STT-RAM), as an emerging nonvolatile memory technology, provides very dense array structure and extremely low leakage power consumption. It demonstrates a great potential in replacing conventional static random access memory technology to develop the next-generation on-chip cache memory of microprocessors and graphics processing units. The multilevel cell (MLC) design of STT-RAM that stores two or more bits in one cell potentially has higher storage capacity and faster system performance, attracting significant attention. In this paper, we first quantitatively evaluated the data storage density of the MLC STT-RAM. Our results revealed limited density improvement because of the large size of access transistor induced by high write current amplitude requirement and asymmetry of switching behavior. Moreover, the read and write accesses of existing MLC STT-RAM cache designs require two-step operation. The system level evaluation shows that the long access latency could amortize the performance speed brought by larger cache size, and even degrade the system performance for some applications. To unleash the potential of MLC STT-RAM cache, we proposed a new design through a cross-layer co-optimization. The memory cell structure integrated the reversed stacking of magnetic junction tunneling for a more balanced device and design tradeoff. In architecture development, we presented an adaptive mode switching mechanism: based on application’s memory access behavior, the MLC STT-RAM cache can dynamically change between low latency single-level cell mode and high capacity MLC mode. Furthermore, we divided cache lines into fast and slow regions and investigated new data migration policies to allocate frequently access data to fast regions. Simulation results show that the proposed techniques can improve the system performance by 10.2% and reduce the energy consumption on cache by 9.5% compared with conventional MLC STT-RAM cache design.