Abstract

The cross-modal molecule retrieval (Text2Mol) task aims to bridge the semantic gap between molecules and natural language descriptions. A solution to this non-trivial problem relies on graph convolutional network (GCN) and cross-modal attention with contrastive learning for reasonable results. However, there exist the following issues: 1) the cross-modal attention mechanism is only in favor of text representations and can not provide helpful information for molecule representations. 2) the GCN-based molecule encoder ignores edge features and the importance of various substructures of a molecule. 3) the retrieval learning loss function is rather simplistic. This paper further investigates the Text2Mol problem and proposes a novel Adversarial Modality Alignment Network (AMAN)-based method to sufficiently learn both description and molecule information. Our method utilizes a SciBERT as a text encoder and a graph transformer network as a molecule encoder to generate multimodal representations. Then an adversarial network is used to align these modalities interactively. Meanwhile, a triplet loss function is leveraged to perform retrieval learning and further enhance the modality alignment. Experiments on the ChEBI-20 dataset show the effectiveness of our AMAN method compared with baselines.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call