Abstract

Abstract It is increasingly common to use chatbots as an interface to services. One of the main components of a chatbot is the Natural Language Understanding (NLU) model, which is responsible for interpreting the text and extracting the intent and entities present in that text. It’s possible to focus only on one of these tasks of NLU, such as intent classification. To train an NLU intent classification model, it’s generally necessary to use a considerable amount of annotated data, where each sentence of the dataset receives a label indicating an intent. Performing manually labeling data is arduous and impracticable, depending on the data volume. Thus, an unsupervised machine learning technique, such as data clustering, could be applied to find and label patterns in the data. For this task, it is essential to have an effective vector embedding representation of texts that depicts the semantic information and helps the machine understand the context, intent, and other nuances of the entire text. This paper extensively evaluates different text embedding models for clustering and labeling. We also apply some operations to improve the dataset’s quality, such as removing sentences and establishing various strategies for distance thresholds (cosine similarity) for the clusters’ centroids. Then, we trained some intent classification Models with two different architectures, one built with the Rasa framework and the other with a neural network (NN) using the attendance text from the Coronavirus Platform Service of Ceará, Brazil. We also manually annotated a dataset to be used as validation data. We conducted a study on semiautomatic labeling, implemented through clustering and visual inspection, which introduced some labeling errors to the intent classification models. However, it would be unfeasible to annotate the entire dataset manually. Nevertheless, results of competitive accuracy were still achieved with the trained models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.