With the acceleration of the pace of work and life, people are facing more and more pressure, which increases the probability of suffering from depression. However, many patients may fail to get a timely diagnosis due to the serious imbalance in the doctor–patient ratio in the world. A promising development is that physiological and psychological studies have found some differences in speech and facial expression between patients with depression and healthy individuals. Consequently, to improve current medical care, Deep Learning (DL) has been used to extract a representation of depression cues from audio and video for automatic depression detection. To classify and summarize such research, we introduce the databases and describe objective markers for automatic depression estimation. We also review the DL methods for automatic detection of depression to extract a representation of depression from audio and video. Lastly, we discuss challenges and promising directions related to the automatic diagnoses of depression using DL.