The phenomenon of metaphors is often regarded as a departure from semantic selection and common sense in semantic composition tasks, with their fundamental characteristics being semantic conflicts. Recognizing intricate and elusive metaphors has consistently posed a formidable challenge in the realm of natural language processing. In this paper, we introduce a novel multimodal metaphor detection model named SC-Net. This model benefits from linguistic metaphor identification theories and aims to exploit the inherent semantic conflicts in metaphors to detect whether multimodal corpora are metaphorical. The model projects each modality into two distinct subspaces: a modality-semantic feature space and a modality-latent metaphorical feature space. The features derived from these spaces provide a comprehensive perspective for capturing both intramodal semantic conflicts and intermodal semantic conflicts for prediction purposes. Our experiments conducted on the Chinese, English, and Bilingual Met-meme datasets and the MultiBully dataset demonstrate that our proposed SC-Net achieves state-of-the-art performance.