Depression is now a prevalent mental illness and multimodal data-based depression detection is an essential topic of research. Internet of Medical Things devices can provide data resources such as text, audio, and vision, which is valuable for depression detection. Moreover, previous studies have concentrated on using single characteristics of each modality, such as low-dimensional pre-designed features and high-level deep representation, which cannot completely capture the emotional information included in the data. Against this background, we design an intra-modal and inter-modal fusion framework called IIFDD for Corpus-based depression detection. Intra-modal fusion module is designed to integrate low-dimensional pre-designed features and high-dimension deep representation from the same modality for better learning of the semantics information. Then, the inter-modal fusion module is proposed to fuse features from different modalities with attention mechanisms and use the fused result to complete the depression classification. Experiments on two Chinese depression corpus datasets with acoustics, textual, and visual features show that IIFDD can achieve state-of-the-art performance for depression detection.