Diagnostic reliability is essential for the science and practice of psychology, in part because reliability is necessary for validity. Recently, the DSM-5 field trials documented lower diagnostic reliability than past field trials and the general research literature, resulting in substantial criticism of the DSM-5 diagnostic criteria. Rather than indicating specific problems with DSM-5, however, the field trials may have revealed long-standing diagnostic issues that have been hidden due to a reliance on audio/video recordings for estimating reliability. We estimated the reliability of DSM-IV diagnoses using both the standard audio-recording method and the test-retest method used in the DSM-5 field trials, in which different clinicians conduct separate interviews. Psychiatric patients (N = 339) were diagnosed using the SCID-I/P; 218 were diagnosed a second time by an independent interviewer. Diagnostic reliability using the audio-recording method (N = 49) was "good" to "excellent" (M κ = .80) and comparable to the DSM-IV field trials estimates. Reliability using the test-retest method (N = 218) was "poor" to "fair" (M κ = .47) and similar to DSM-5 field-trials' estimates. Despite low test-retest diagnostic reliability, self-reported symptoms were highly stable. Moreover, there was no association between change in self-report and change in diagnostic status. These results demonstrate the influence of method on estimates of diagnostic reliability.