Abstract

The human-computer dialogue has recently attracted extensive attention from both academia and industry as an important branch in the field of artificial intelligence (AI). However, there are few studies on the evaluation of large-scale Chinese human-computer dialogue systems. In this paper, we introduce the Second Evaluation of Chinese Human-Computer Dialogue Technology, which focuses on the identification of a user's intents and intelligent processing of intent words. The Evaluation consists of user intent classification (Task 1) and online testing of task-oriented dialogues (Task 2), the data sets of which are provided by iFLYTEK Corporation. The evaluation tasks and data sets are introduced in detail, and meanwhile, the evaluation results and the existing problems in the evaluation are discussed.

Highlights

  • With the development of artificial intelligence, human-computer dialogue technology has become increasingly popular and has attracted growing attention [1]

  • The Evaluation consists of user intent classification (Task 1) and online testing of task-oriented dialogues (Task 2), the data sets of which are provided by iFLYTEK Corporation

  • In order to avoid the imbalance of category distribution and take into account each category, we evaluate submitted systems based on the F1-measure obtained from precision and recall

Read more

Summary

INTRODUCTION

With the development of artificial intelligence, human-computer dialogue technology has become increasingly popular and has attracted growing attention [1]. Task 1 in the 17th China National Conference on Computational Linguistics (CCL2018) , which is based on Chinese corpora, is a user intent classification task in the customer service field They provide some open data to allow participants to build systems and test them on hidden data sets. In DSTC6, participants need to build a system that responds to a user’s utterances based on the context of the conversation, where they can use external data Both objective and subjective indicators are used to evaluate the submitted systems [11]. The submitted systems should complete the corresponding tasks about tickets inquiring or reservation through online real-time dialogues with testers This Evaluation has automatic evaluation (for user intent classification tasks) and online manual testing (for online testing of task-oriented dialogues).

Task 1
Task 2
EVALUATION OF DATA SETS
Analysis
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.