A Comparative Study of Different Dimensionality Reduction Techniques for Arabic Machine Translation

Nouhaila Bensalah,Habib Ayad,Abdelhamid Ibn El Farouk,Abdellah Adib

doi:10.1145/3634681

Abstract

Word embeddings are widely deployed in a tremendous range of fundamental natural language processing applications and are also useful for generating representations of paragraphs, sentences, and documents. In some contexts involving constrained memory, it may be beneficial to reduce the size of word embeddings since they represent a core component of several natural language processing tasks. By reducing the dimensionality of word embeddings, their usefulness in memory-limited devices can be significantly improved, yielding gains in many real-world applications. This article aims to provide a comparative study of different dimensionality reduction techniques to generate efficient lower-dimensional word vectors. Based on empirical experiments carried out on the Arabic machine translation task, we found that the post-processing algorithm combined with independent component analysis provides optimal performance over the considered dimensionality reduction techniques. Therefore, we arrive at a new combination of the post-processing algorithm and dimensionality reduction (independent component analysis) techniques, which has not been investigated before. The latter was applied to both contextual and non-contextual word embeddings to reduce the size of the vectors while achieving a better translation quality than the original ones.

Full Text