Preprocessing Text Data for the Doc2Vec Model in the Automatic Classification of Tickets in Support Centers of Software Providers

O M Shatalova,A A Kuzmin

doi:10.22213/2410-9304-2024-3-103-112

Abstract

Leading software providers typically implement customer technical support functions, which are crucial for promoting and enhancing the competitiveness of their products and services in global markets. The high volume and heterogeneity of support tickets (functional, temporal, linguistic, etc.) highlight the importance of efficient classification systems. Effective classification optimizes the distribution of these tickets among support center specialists and automates their processing using an established knowledge base. However, classifying these tickets is a loosely formalized task. For companies that have accumulated substantial data on customer requests, automating classification through machine learning methods and natural language processing models, such as Word2Vec, FastText, BERT, and GPT, becomes feasible. It is generally accepted that classification effectiveness primarily depends on the model employed. Nevertheless, the quality of these models is significantly influenced by the nature of the training data. Literature review of the reveals significant research interest in methods for the automatic classification of tickets specifically tailored to the operational conditions of software provider support centers. However, there is a noticeable gap in the literature regarding the impact of data preprocessing on the quality of these models. The article aims to clarify the techniques of data preprocessing and analyze their impact on the effectiveness of text classification, considering the specificity of software provider support centers. This study examines the stages of the automatic classification process for tickets, accounting for the unique characteristics of the data (customer text requests). A relevant set of specified methodological and instrumental tools was developed and tested using open data from a global software provider (DevExpress). The testing involved a database of 165,000 tickets. The study's results indicate that preprocessing can improve classification metrics such as F-measure, Precision, and Recall from 77% to 79%. Additionally, preprocessing significantly reduces the dimensionality of text data (by 48.2%) and increases model training speed (by 26.5%) without loss of accuracy, ensuring cost-efficiency and operational efficiency in the use of computational resources.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Preprocessing Text Data for the Doc2Vec Model in the Automatic Classification of Tickets in Support Centers of Software Providers

Abstract

Talk to us

Similar Papers

More From: Intellekt. Sist. Proizv.

Lead the way for us

Similar Papers

On the Explainability of Natural Language Processing Deep Models
Julia El Zini ... Mariette Awad
ACM Computing Surveys | VOL. 55
Julia El Zini, et. al.Julia El Zini ... Mariette Awad
03 Dec 2022
ACM Computing Surveys | VOL. 55

Information retrieval and classification of real-time multi-source hurricane evacuation notices
Tingting Zhao ... Jinfeng Zhang
International Journal of Disaster Risk Reduction | VOL. 111
Tingting Zhao, et. al.Tingting Zhao ... Jinfeng Zhang
01 Aug 2024
International Journal of Disaster Risk Reduction | VOL. 111

Understanding older people's voice interactions with smart voice assistants: a new modified rule-based natural language processing model with human input.
Zhengxu Yan ... Julie Blaskewicz Boron
Frontiers in digital health | VOL. 6
Zhengxu Yan, et. al.Zhengxu Yan ... Julie Blaskewicz Boron
01 Jan 2024
Frontiers in digital health | VOL. 6

Can natural language processing models extract and classify instances of interpersonal violence in mental healthcare electronic records: an applied evaluative study
Riley Botelle ... Robert Stewart
BMJ Open | VOL. 12
Riley Botelle, et. al.Riley Botelle ... Robert Stewart
01 Feb 2022
BMJ Open | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Preprocessing Text Data for the Doc2Vec Model in the Automatic Classification of Tickets in Support Centers of Software Providers

Abstract

Talk to us

Similar Papers

More From: Intellekt. Sist. Proizv.