ObjectiveTo determine, in a head-to-head comparison, which of two RAND-based knee replacement appropriateness criteria is optimal based on comparison to an externally validated method of judging good versus poor outcome. DesignLongitudinal data from the Osteoarthritis Initiative (OAI) and the Multicenter Osteoarthritis Study (MOST) were combined to produce a dataset of 922 persons with knee arthroplasty, 602 of which had adequate data for RAND classification and had their surgery within one year prior to a study visit. Data were used to determine appropriateness classification (i.e., Appropriate, Inconclusive, Rarely Appropriate) using modified versions of the first-generation and second-generation Escobar system. Growth curve analyses and multivariable regression were used to compare the two systems. ResultsNeither system associated with the gold standard measure of good versus poor outcome. Distributions of appropriateness categories for the second-generation system were inconsistent with current evidence for knee arthroplasty outcome. For example, 16% of participants were classified as Appropriate and 64% as Rarely Appropriate for pain outcome. Distributions for the first-generation system aligned with current evidence. ConclusionThe first-generation modified version of the Escobar appropriateness system is superior to the newer version but neither version associated with our gold standard growth curve analyses. Both systems only differentiate between patient classification groups preoperatively and up to ten months following surgery. Reliance on appropriateness criteria to inform long-term outcome is not warranted.