Abstract
Leading software providers typically implement customer technical support functions, which are crucial for promoting and enhancing the competitiveness of their products and services in global markets. The high volume and heterogeneity of support tickets (functional, temporal, linguistic, etc.) highlight the importance of efficient classification systems. Effective classification optimizes the distribution of these tickets among support center specialists and automates their processing using an established knowledge base. However, classifying these tickets is a loosely formalized task. For companies that have accumulated substantial data on customer requests, automating classification through machine learning methods and natural language processing models, such as Word2Vec, FastText, BERT, and GPT, becomes feasible. It is generally accepted that classification effectiveness primarily depends on the model employed. Nevertheless, the quality of these models is significantly influenced by the nature of the training data. Literature review of the reveals significant research interest in methods for the automatic classification of tickets specifically tailored to the operational conditions of software provider support centers. However, there is a noticeable gap in the literature regarding the impact of data preprocessing on the quality of these models. The article aims to clarify the techniques of data preprocessing and analyze their impact on the effectiveness of text classification, considering the specificity of software provider support centers. This study examines the stages of the automatic classification process for tickets, accounting for the unique characteristics of the data (customer text requests). A relevant set of specified methodological and instrumental tools was developed and tested using open data from a global software provider (DevExpress). The testing involved a database of 165,000 tickets. The study's results indicate that preprocessing can improve classification metrics such as F-measure, Precision, and Recall from 77% to 79%. Additionally, preprocessing significantly reduces the dimensionality of text data (by 48.2%) and increases model training speed (by 26.5%) without loss of accuracy, ensuring cost-efficiency and operational efficiency in the use of computational resources.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.