Background: The aim of this study was to compare the validity of the Hospital Anxiety and Depression Scale (HADS), the WHO (five) Well Being Index (WBI-5), the Patient Health Questionnaire (PHQ), and physicians’ recognition of depressive disorders, and to recommend specific cut-off points for clinical decision making. Methods: A total of 501 outpatients completed each of the three depression screening questionnaires and received the Structured Clinical Interview for DSM-IV (SCID) as the criterion standard. In addition, treating physicians were asked to give their psychiatric diagnoses. Criterion validity and Receiver Operating Characteristics (ROC) were determined. Areas under the curves (AUCs) were compared statistically. Results: All depression scales showed excellent internal consistencies (Cronbach’s α: 0.85–0.90). For ‘major depressive disorder’, the operating characteristics of the PHQ were significantly superior to both the HADS and the WBI-5. For ‘any depressive disorder’, the PHQ showed again the best operating characteristics but the overall difference did not reach statistical significance at the 5% level. Cut-off points that can be recommended for the screening of ‘major depressive disorder’ had sensitivities of 98% (PHQ), 94% (WBI-5), and 85% (HADS). Corresponding specificities were 80% (PHQ), 78% (WBI-5), and 76% (HADS). In contrast, physicians’ recognition of ‘major depressive disorder’ was poor (sensitivity, 40%; specificity, 87%). Limitations: Our sample may not be representative of medical outpatients, but sensitivity and specificity are independent of disorder prevalence. Conclusions: All three questionnaires performed well in depression screening, but significant differences in criterion validity existed. These results may be helpful in the selection of questionnaires and cut-off points.