Resistive random access memory (RRAM)-based compute-in-memory (CIM) has shown great potential for accelerating deep neural network (DNN) inference. However, device characteristics, such as low-resistance values, susceptibility to drift, and single-level cells, may limit the capabilities of RRAM-based CIM. In addition, prior works generally used the off-chip write-verify scheme to tighten RRAM resistance distributions and used off-chip analog-to-digital converter (ADC) references for fine-tuning partial sum quantization. Although off-chip techniques are viable for testing purposes, they may be unsuitable for practical applications. In this work, we present an RRAM-CIM macro to accelerate DNN inference. The chip features: 1) multi-level cell (MLC) RRAM for improving compute performance and density; 2) sparsity-aware input control to leverage the high activation sparsity in DNN models; 3) on-chip write-verify to speed up initial weight programming and periodically refresh cells to compensate for resistance drift under stress; and 4) on-chip ADC reference generation that provides column-wise tunability and stability with varying temperatures to guarantee the CIFAR-10 accuracy of 85.8% at 120 °C. The design is fabricated in TSMC 40-nm process with embedded RRAM technology and achieves a macro-level peak performance of 97.8 GOPS/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> and 44.5 TOPS/W for multiply-and-accumulate (MAC) operations on VGG-8 network with ternary weights.