Monolithic 3D (M3D) integration has been recently introduced as a viable solution for fine-grained 3D integration. Since the conventional 3D integration uses relatively large micro-scale through-silicon-vias (TSVs), which causes large TSV area overhead, it is not cost-effective for small micro architectural blocks such as L1 caches. On the contrary, the M3D integration offers nano-scale monolithic inter-tier vias (MIVs) which are much smaller than TSVs. Thus, the M3D integration is known to be even feasible for 3D stacking of small micro architectural blocks, which reduces wire length of the blocks, leading to better performance and energy-efficiency. In this paper, we quantify the architectural impact (in terms of performance, power, temperature, and area) of the M3D integration for L1 caches. In our evaluation, the 8-layer stacked M3D L1 caches show 34.1~43.2 percent shorter access time than the 2D L1 cache. As a result, the M3D L1 caches improve the performance of SPEC CPU 2006 applications by 9.9 percent (up to 43.7 percent), on average, compared to the conventional 2D L1 caches. Additionally, the 8-layer stacked M3D L1 caches reduce dynamic energy and leakage power by 58.9 percent ~60.8 percent and 57.9~59.1 percent, respectively, compared to the 2D L1 cache. Additionally, though 3D stacking inevitably causes higher temperature than 2D baseline, since the M3D integration provides better heat dissipation as well as lower power consumption than the conventional TSV-3D, it reduces peak L1 cache temperature by up to 7.6°C, compared to the TSV-3D.