Abstract We propose a new collection of benchmark problems in mechanizing the metatheory of programming languages, in order to compare and push the state of the art of proof assistants. In particular, we focus on proofs using logical relations (LRs) and propose establishing strong normalization of a simply typed calculus with a proof by Kripke-style LRs as a benchmark. We give a modern view of this well-understood problem by formulating our LR on well-typed terms. Using this case study, we share some of the lessons learned tackling this problem in different dependently typed proof environments. In particular, we consider the mechanization in Beluga, a proof environment that supports higher-order abstract syntax encodings and contrast it to the development and strategies used in general-purpose proof assistants such as Coq and Agda. The goal of this paper is to engage the community in discussions on what support in proof environments is needed to truly bring mechanized metatheory to the masses and engage said community in the crafting of future benchmarks.