The modern software engineering process is evolutionary, with commits/patches begetting new versions of code, progressing steadily toward improved systems. In recent years, program analysis and verification tools have exploited version-based reasoning, where new code can be seen in terms of how it has changed from the previous version. When considering program versions, refinement seems a natural fit and, in recent decades, researchers have weakened classical notions of concrete refinement and program equivalence to capture similarities as well as differences between programs. For example, Benton, Yang and others have worked on state-based refinement relations . In this paper, we explore a form of weak refinement based on trace relations rather than state relations. The idea begins by partitioning traces of a program C 1 into trace classes, each identified via a restriction r 1 . For each class, we specify similar behavior in the other program C 2 via a separate restriction r 2 on C 2 . Still, these two trace classes may not yet be equivalent so we further permit a weakening via a binary relation A on traces, that allows one to, for instance disregard unimportant events, relate analogous atomic events, etc. We address several challenges that arise. First, we explore one way to specify trace refinement relations by instantiating the framework to Kleene Algebra with Tests (KAT) due to Kozen. We use KAT intersection for restriction, KAT hypotheses for A , KAT inclusion for refinement, and have proved compositionality. Next, we present an algorithm for automatically synthesizing refinement relations, based on a mixture of semantic program abstraction, KAT inclusion, a custom edit-distance algorithm on counterexamples, and case-analysis on nondeterministic branching. We have proved our algorithm to be sound. Finally, we implemented our algorithm as a tool called Knotical, on top of Interproc and Symkat. We demonstrate promising first steps in synthesizing trace refinement relations across a hand-crafted collection of 37 benchmarks that include changing fragments of array programs, models of systems code, and examples inspired by the thttpd and Merecat web servers.