Abstract

Representation of language is the first and critical task for Natural Language Understanding (NLU) in a dialogue system. Pretraining, embedding model, and fine-tuning for intent classification and slot-filling are popular and well-performing approaches but are time consuming and inefficient for low-resource languages. Concretely, the out-of-vocabulary and transferring to different languages are two tough challenges for multilingual pretrained and cross-lingual transferring models. Furthermore, quality-proved parallel data are necessary for the current frameworks. Stepping over these challenges, different from the existing solutions, we propose a novel approach, the Hypergraph Transfer Encoding Network “HGTransEnNet. The proposed model leverages off-the-shelf high-quality pretrained word embedding models of resource-rich languages to learn the high-order semantic representation of low-resource languages in a transductive clustering manner of hypergraph modeling, which does not need parallel data. The experiments show that the representations learned by “HGTransEnNet” for low-resource language are more effective than the state-of-the-art language models, which are pretrained on a large-scale multilingual or monolingual corpus, in intent classification and slot-filling tasks on Indonesian and English datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.