Depression is a significant global health challenge. Still, many people suffering from depression remain undiagnosed. Furthermore, the assessment of depression can be subject to human bias. Natural Language Processing (NLP) models offer a promising solution. We investigated the potential of four NLP models (BERT, Llama2-13B, GPT-3.5, and GPT-4) for depression detection in clinical interviews. Participants (N = 82) underwent clinical interviews and completed a self-report depression questionnaire. NLP models inferred depression scores from interview transcripts. Questionnaire cut-off values for depression were used as a classifier for depression. GPT-4 showed the highest accuracy for depression classification (F1 score 0.73), while zero-shot GPT-3.5 initially performed with low accuracy (0.34), improved to 0.82 after fine-tuning, and achieved 0.68 with clustered data. GPT-4 estimates of symptom severity PHQ-8 score correlated strongly (r = 0.71) with true symptom severity. These findings demonstrate the potential of AI models for depression detection. However, further research is necessary before widespread deployment can be considered.