Abstract

Cross-modal anomaly detection is a relatively new and challenging research topic in machine learning field, which aims at identifying the anomalies whose patterns are disparate across different modalities. As far as we know, this topic has yet to be well studied, and existing works often suffer from the incomplete anomalous data detection and low data utilization problems. To alleviate these limitations, this paper proposes an efficient deep cross-modal anomaly detection approach via Triple-adaptive Network and Bi-quintuple Contrastive Learning (TN-BCL), which lies among the earliest attempt to detect various cross-modal anomalies within the heterogeneous multi-modal data. To be specific, a triple-adaptive network is explicitly designed to identify various anomalies, whose patterns are disparate in both single-modal scenario and cross-modal scenario. On the one hand, the top branch network is utilized to adaptively detect the attribute anomalies and part of mixed anomalies in multi-modal data samples. On the other hand, the bottom two-branch network, with shared residual blocks, is leveraged to learn the discriminative cross-modal embeddings. At the same time, an efficient bi-quintuple contrastive learning method is designed to enhance the feature correlation between the same attribute data, while maximally enlarging the feature difference between different attribute data. Besides that, the bidirectional learning scheme is employed to significantly improve the data utilization. Through the joint exploitation of the above, different kinds of anomalous samples can be well detected across different modalities. Extensive experiments show that the proposed framework outperforms the state-of-the-art competing methods, with a large improvement margin.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call