Abstract

Word embeddings based on a conditional model are commonly used in Natural Language Processing (NLP) tasks to embed the words of a dictionary in a low dimensional linear space. Their computation is based on the maximization of the likelihood of a conditional probability distribution for each word of the dictionary. These distributions form a Riemannian statistical manifold, where word embeddings can be interpreted as vectors in the tangent space of a specific reference measure on the manifold. A novel family of word embeddings, called -embeddings have been recently introduced as deriving from the geometrical deformation of the simplex of probabilities through a parameter , using notions from Information Geometry. After introducing the -embeddings, we show how the deformation of the simplex, controlled by , provides an extra handle to increase the performances of several intrinsic and extrinsic tasks in NLP. We test the -embeddings on different tasks with models of increasing complexity, showing that the advantages associated with the use of -embeddings are present also for models with a large number of parameters. Finally, we show that tuning allows for higher performances compared to the use of larger models in which additionally a transformation of the embeddings is learned during training, as experimentally verified in attention models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.