A large variety of applications rely on deep learning to process big data, learn sophisticated features, and perform complicated tasks. Utilizing emerging non-volatile memory (NVM)'s unique characteristics, including the crossbar array structure and gray-scale cell resistances, to perform neural network (NN) computation is a well-studied approach in accelerating deep learning applications. Compared to other NVM technologies, STT-MRAM has its unique advantages in performing NN computation. However, the state-of-the-art research have not utilized STT-MRAM for deep learning acceleration due to its device- and architecture-level challenges. Consequently, this paper enables STT-MRAM, for the first time, as an effective and practical deep learning accelerator. In particular, it proposes a full-stack framework iCELIA spanning multiple design levels, including device-level fabrication, circuit-level enhancements, architecture-level synaptic weight quantization, and system-level accelerator design. The primary contributions of iCELIA over our prior work CELIA include a new non-uniform weight quantization scheme and much enhanced accelerator system design. The proposed framework significantly mitigates the model accuracy loss due to reduced data precision in a cohesive manner, constructing a comprehensive STT-MRAM accelerator system for fast NN computation with high energy efficiency and low cost.