Caches are key microarchitectural components of modern processors to improve memory access speed. However, attackers can readily implement timing side-channel attacks by exploiting the inherent time difference between cache hits and misses, which brings serious security threats to computer systems. In recent years, cache attacks have achieved fruitful consequences regarding attack techniques, targets, and platforms. At the same time, in order to measure the cache vulnerability, researchers have been looking at verifying the security policies and have proposed a variety of evaluation metrics. However, there are certain limitations in the current metrics, such as harsh assumptions, the distinguishing ability cannot be maintained all the time, the false positive and false negative noises can not be explicitly quantified, other tools are needed, and the verification environment is difficult to reproduce. To face these issues, we proposed a set of evaluation metrics, which includes the highest score (HS), minimum rounds to disclosure (MRD), and key score scissors differential (KSSD); we give the formal expressions in practice and theoretical, respectively. In order to facilitate the accurate reproduction of the experimental environment and results, we build a dual-core RISC-V bare-metal system based on the open-source framework Chipyard. On this basis, we conduct comprehensive evaluations to exploit vulnerabilities within the RISC-V cache architecture to verify the validity of the metrics suite. Furthermore, we compared the metrics with the commonly used success rate (SR) and guessing entropy (GE). The comparison results show that our proposed metrics suite can accurately depict the effects of attacks. Moreover, KSSD can maintain discrimination and converge quickly; MRD has more practical guidance; HS can specifically describe false negative noises. Finally, we perform a theoretical evaluation of AES algorithms with different T-table implementations using the proposed metrics, analyze the system's false positive and false negative noises, and suggest how the number of cache evictions can be determined under the random replacement policy.