Abstract

Mixture of multiple language forms in spoken Chinese is a common but unfavorable issue.. It increases the difficulty of intent understanding and leads to inconvenience for information communication. Existing studies on intent recognition mainly focus on single language form or parallel multilingual language while paying little attention to spoken texts including multiple language forms. In considering that it is hard to capture the semantics of an expression with multiple language forms, it is important to study the problem. To solve this issue, a text representation model for the spoken Chinese expression mixed with English and Chinese Pinyin is proposed. And the feature matrix is built to mine the composition information of English and Pinyin. Besides, the model can efficiently distinguish English from Chinese Pinyin even though both fragments are composed of English letters. Meanwhile, it can effectively process the problem of hidden text information since the problem has been transformed into the Chinese translation task of English and Pinyin. In addition, to verify the performance of the model, the texts processed by this model are used as the input of the classifier. extensive experiments on a large online logistics manual customer service corpus show that this text representation model is correct and effective. It can not only eliminate the obstacles of the mixing of multiple language forms but also bring better results for intent understanding.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.