Abstract

Cross-modal clustering (CMC) aims to enhance the clustering performance by exploring complementary information from multiple modalities. However, the performances of existing CMC algorithms are still unsatisfactory due to the conflict of heterogeneous modalities and the high-dimensional non-linear property of individual modality. In this paper, a novel deep mutual information maximin (DMIM) method for cross-modal clustering is proposed to maximally preserve the shared information of multiple modalities while eliminating the superfluous information of individual modalities in an end-to-end manner. Specifically, a multi-modal shared encoder is firstly built to align the latent feature distributions by sharing parameters across modalities. Then, DMIM formulates the complementarity of multi-modalities representations as an mutual information maximin objective function, in which the shared information of multiple modalities and the superfluous information of individual modalities are identified by mutual information maximization and minimization respectively. To solve the DMIM objective function, we propose a variational optimization method to ensure it converge to a local optimal solution. Moreover, an auxiliary overclustering mechanism is employed to optimize the clustering structure by introducing more detailed clustering classes. Extensive experimental results demonstrate the superiority of DMIM method over the state-of-the-art cross-modal clustering methods on IAPR-TC12, ESP-Game, MIRFlickr and NUS-Wide datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.