Channel frequency response (CFR) is a fine-grained location-specific information in WiFi systems that can be utilized in indoor positioning systems (IPSs). However, CFR-based IPSs can hardly achieve an accuracy at the centimeter level due to the limited bandwidth in WiFi systems. To achieve such accuracy using WiFi devices, we propose an IPS that fully harnesses the spatial diversity in multiple-input-multiple-output WiFi systems, which leads to a much larger effective bandwidth than the bandwidth of a WiFi channel. The proposed IPS obtains CFRs associated with locations-of-interest on multiple antenna links during the training phase. In the positioning phase, the IPS captures instantaneous CFRs from a location to be estimated and compares it with the CFRs acquired in the training phase via the time-reversal resonating strength with residual synchronization errors compensated. Extensive experiment results in an office environment with a measurement resolution of 5 cm demonstrate that, with a single pair of WiFi devices and an effective bandwidth of 321 MHz, the proposed IPS achieves detection rates of 99.91% and 100% with false alarm rates of 1.81% and 1.65% under the line-of-sight (LOS) and non-LOS (NLOS) scenarios, respectively. Meanwhile, the proposed IPS is robust against environment dynamics. Moreover, experiment results with a measurement resolution of 0.5 cm demonstrate a localization accuracy of 1-2 cm in the NLOS scenario.