Abstract

The work presented in this thesis is directed at investigating the possibility of combining text corpora and Knowledge Bases (KBs) for learning word representations. More specifically, the aim was to propose joint approaches that leverage the two types of resources for the purpose of enhancing the word meaning representations. The main research question to be answered was “Is it possible to enhance the word representations by jointly incorporating text corpora and KBs into the word representations learning process? If so, what are the aspects of word meaning that can be enhanced by combining those two types of resources? ”. The primary contribution of the thesis is three main joint approaches for learning word representations: (i) Joint Representation Learning for Additional Evidence (JointReps), (ii) Joint Hierarchical Word Representation (HWR) and (iii) Sense-Aware Word Representations (SAWR). The JointReps was founded to improve the overall semantic representation of words. To this end, it sought additional evidence from a KB to the co-occurrence statistics in the corpus. In particular, JointReps enforced two words that are in a particular semantic relationship in the KB to have similar word representations. The HWR approach was then proposed to learn word representations in a specific order to encode the hierarchical information in a KB in the learnt representations. The HWR considered not only the hypernym relations that exist between words in a KB, but also contextual information in a text corpus. Specifically, given a training corpus and a KB, HWR learnt word representations that simultaneously encoded the hierarchical structure in the KB as well as the co-occurrence statistics between pairs of words in the corpus. A particularly novel aspect of the HWR approach was that it exploits the full hierarchical path of words existing in the KB. The SAWR approach was then introduced to consider not only word representations but also the different senses (different meanings) associated with each word. The SAWR required the learnt representations to predict the word and the senses accurately. It learnt the sense-aware word representations jointly using both unlabelled and sense-labelled text corpora. The approaches were comprehensively analysed and evaluated in various standard and newly-proposed tasks using a wide range of benchmark datasets. The evaluation was conducted to compare the quality of the learnt word representations by the proposed approaches with word representations learnt by sole-resource baselines and previously proposed joint approaches in the literature. All the proposed joint approaches have proven to be effective for enhancing the learnt word representations. More specifically, the proposed joint approaches were found to report significant improvements over the approaches that use only one type of resources and the previously proposed joint approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.