Abstract

In online education systems, learning exercise representation is a fundamental task in many applications, such as exercise retrieval and recommendation. To capture the heterogeneous data information of exercises (i.e., texts and images), deep multimodal approaches show the promising performance. However, these methods have two limitations: (1) they only care about context on one side, which fails to utilize the future context chunks in the exercises; (2) they cannot ensure the presentation ability due to the scarcity of labelled data. In this paper, we propose a bidirectional contrastive representation network (BCRNet) to tackle these issues. First, we construct a representation module with a masking constraint loss to take into account the bidirectional context contents of exercises. Second, we design a contrastive learning approach which uses a multimodal contrastive loss to reshape the multimodal representation space and improve model presentation ability without labelled data. Moreover, a text-image matching strategy is designed to provide semantic links between texts and images. On the real-world dataset, experiments demonstrate BCRNet performs significantly better than many strong baselines.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call