Smart campus builds on characteristic learning and feedback evaluation of diverse students and aims to enable intelligent, accurate, and customized education. Mining social media data, especially topic modeling, from students, provides a non-intrusive method to know the instantaneous thoughts and willings of them. However, it is challenging to deal with multi-modal data (i.e., text, images, and videos contained in the social media data) as well as the modality dependence and missing modality. In this paper, we present a novel deep topical correlation analysis (DTCA) approach, which achieves robust and accurate topic detection for microblogs and simultaneously handles the two challenges aforementioned. In particular, bidirectional recurrent neural networks and convolutional neural networks are used to learn deep textual and visual features, respectively. Then, a canonical correlation analysis-based fusion scheme is proposed, which has two innovations to deal with both modality independence and modality missing, i.e., a filter gate to capture the modality dependency and a matrix-projection based component to handle the missing modality. DTCA is trained in an end-to-end manner, in which the parameters of visual, textual, and cross-modal prediction parts are trained jointly. We further release a large-scale cross-modal twitter dataset for topic detection, denoted as TM-Twitter. On this dataset, extensive and quantitative evaluations are conducted with comparisons to several state-of-the-art and alternative approaches. Significant performance gains are reported to demonstrate the merits of the proposed DTCA.
Read full abstract