Abstract
Learning semantic sentence embeddings is beneficial to a variety of natural language processing tasks. Recently, methods using the contrastive learning framework to fine-tune pre-trained language models have been proposed and have achieved significant performance on sentence embeddings. However, sentence embeddings are easy to “overfit” to the contrastive learning goal. With the training of contrastive learning, the gap between contrastive learning and test tasks leads to unstable even declining performance on test tasks. For this reason, existing methods rely on the labeled development set to frequently evaluate the performance on test tasks and get the best checkpoints. In such a way, models are limited when the labeled data is unavailable or extremely scarce. To address this problem, we propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">P</i> seudo- <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">S</i> iamese network <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">M</i> utual <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">L</i> earning (PSML) for self-supervised sentence embeddings to reduce the gap between contrastive learning and test tasks. Consisting of the main encoder and the auxiliary encoder, PSML utilizes mutual learning as the basic framework. Between the two encoders, two mutual learning losses are constructed to share learning signals. The proposed model framework and losses of PSML help the model be optimized more stably and generalize better to test tasks, such as semantic textual similarity. Extensive experiments on seven public semantic textual similarity datasets show that PSML performs better than previous unsupervised contrastive methods for sentence embeddings. Besides, PSML also gives a stable performance curve on test tasks with training and is able to get the comparative performance without frequent evaluation on the labeled development set.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.