Monolithic 3D stacking of complementary FET (CFET) SRAM arrays increases integration density multi-fold while supporting the inherent SRAM advantages of low write power and near-infinite endurance. We propose stacking multiple 8-transistor CFET-SRAM layers on regular CMOS periphery to achieve an ultra-high-density array for computing-in-memory (CIM). CFET and regular CMOS (FinFET) devices are measured and calibrated with BSIM-CMG compact model. SPICE simulations are performed to evaluate the delay of CIM operation, power consumption, and analog computational error due to device non-linearity. The impact of device non-linearity on neural network inference accuracy is evaluated using the CIMulator simulation platform. Lower CFET current drive due to amorphous (deposited) silicon channel is shown to have negligible impact on CIM operational delay in many cases, as the maximum allowable current is limited by wiring resistance, not transistor drive strength while maintaining accurate weighted sum. Compared to regular 2D CMOS FinFET array. CFET SRAM cells show an improvement up to 57.19% in TOPS/W. Furthermore, the performance in TOPS/W mm 2 is improved up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$19\times $ </tex-math></inline-formula> . A factor proportional to the number of stacked layers for monolithically stacked CFET SRAM cells, makes it highly promising for future edge intelligence.