Abstract
Recently, many researches on learning cross-lingual word embeddings without parallel data have achieved success by utilizing word isomorphism among languages. However, unsupervised cross-lingual sentence representation, which aims to learn a unified semantic space without parallel data, has not been well explored. Though many cross-lingual tasks can be solved by learning a unified sentence representation of different languages benefiting from cross-lingual word embeddings, the performance is not competitive with their supervised counterparts. In this paper, we propose a novel framework for unsupervised cross-lingual sentence representation learning by utilizing linguistic isomorphism in both word and sentence level. After generating pseudo-parallel sentence based on the pre-trained cross-lingual word embeddings, the framework iteratively conducts sentence modeling, word embedding tuning and parallel sentences update. Our experiments show that the proposed framework achieves state-of-the-art results in many cross-lingual tasks, as well as improves the quality of cross-lingual word embeddings. The codes and pre-trained encoders will be released upon the paper publishing.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.