Abstract
Domain adaptation in neural machine translation (NMT) tasks often involves working with datasets that have a different distribution from the training data. In such scenarios, k-nearest-neighbor machine translation (kNN-MT) has been shown to be effective in retrieving relevant information from large datastores. However, the high-dimensional context vectors of large neural machine translation model result in high computational costs for distance computation and storage. To address this issue, index optimization techniques have been proposed, including the use of inverted file index (IVF) and product vector quantization (PQ), called IVFPQ. In this paper, we explore the recent index techniques for efficient machine translation domain adaptation and combine multiple index structures to improve the efficiency of nearest-neighbor search in domain adaptation datasets for machine translation task. Specifically, we evaluate the effectiveness when combining optimized product quantization (OPQ) and hierarchical navigable small-world (HNSW) indexing with IVFPQ. Our study aims to provide insights into the most suitable composite index methods for efficient nearest-neighbor search in domain adaptation datasets, with a focus on improving both accuracy and speed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.