Recent progress in artificial intelligence (AI) have broadened various intelligent application scenarios on the Internet of Multimedia Things (IoMT). Due to the urgent demands for intelligence in the online fashion industry, using AI techniques to explore user’s clothing collocations from fashion data generated by the IoMT system is a challenging task. Under the background, fashion compatibility modeling (FCM), which aims to estimate the matching degree of a given outfit, has attracted great attention in the multimedia analysis field. However, most of the studies often fail to fully leverage multimodal content or ignore the sparse associations between fashion items. In this paper, we propose a novel multimodal high-order relationship inference network (MHRIN) for fashion compatibility modeling task. In MHRIN, we focus on enriching multimodal representations of fashion items by means of incorporating the category correlations and injecting high-order item-item connectivity. Concretely, considering that fashion collocations depend on the semantic relevance patterns between categories, we design a category correlations learning module to adaptively learn category representations. On this basis, multiple modality representations are aggregated by a hierarchical multimodal fusion module to generate visual-semantic embeddings. To address the item-item matching interactions issue, we further refine the final representations by a high-order message propagation module to absorb rich connection information. Experiments on the publicly available dataset demonstrate the superiority of our MHRIN over state-of-the-art methods.