Shared-memory chip-multiprocessor (CMP) architectures define memory consistency models that establish the ordering rules for memory operations from multiple threads. Validating the correctness of a CMP's implementation of its memory consistency model requires extensive monitoring and analysis of memory accesses while multiple threads are executing on the CMP. In this paper, we present a low overhead solution for observing, recording and analyzing shared-memory interactions for use in an emulation and/or post-silicon validation environment. Our approach leverages portions of the CMP's own data caches, augmented only by a small amount of hardware logic, to log information relevant to memory accesses. After transferring this information to a central memory location, we deploy our own analysis algorithm to detect any possible memory consistency violations. We build on the property that a violation corresponds to a cycle in an appropriately defined graph representing memory interactions. The solution we propose allows a designer to choose where to run the analysis algorithm: 1) on the CMP itself; 2) on a separate processor residing on the validation platform; or 3) off-line on a separate host machine. Our experimental results show an 83% bug detection rate, in our testbed CMP, over three distinct memory consistency models, namely: relaxed-memory order, total-store order, and sequential consistency. Finally, note that our solution can be disabled in the final product, leading to zero performance overhead and a per-core area overhead that is smaller than the size of a physical integer register file in a modern processor.