IT Ticket Classification: The Simpler, the Better

Aleksandra Revina,Krisztian Buza,Vera G Meister

doi:10.1109/access.2020.3032840

Abstract

Recently, automatic classification of IT tickets has gained notable attention due to the increasing complexity of IT services deployed in enterprises. There are multiple discussions and no general opinion in the research and practitioners' community on the design of IT ticket classification tasks, specifically the choice of ticket text representation techniques and classification algorithms. Our study aims to investigate the core design elements of a typical IT ticket text classification pipeline. In particular, we compare the performance of TF-IDF and linguistic features-based text representations designed for ticket complexity prediction. We apply various classifiers, including kNN, its enhanced versions, decision trees, naïve Bayes, logistic regression, support vector machines, as well as semi-supervised techniques to predict the ticket class label of low, medium, or high complexity. Finally, we discuss the evaluation results and their practical implications. As our study shows, linguistic representation not only proves to be highly explainable but also demonstrates a substantial prediction quality increase over TF-IDF. Furthermore, our experiments evidence the importance of feature selection. We indicate that even simple algorithms can deliver high-quality prediction when using appropriate linguistic features.

Highlights

With today's increased digitization, any enterprise maintains a broad application portfolio, which is often grown historically
We observed that the classifiers' performance with linguistic features was almost always higher than that one with TF-IDF under comparable conditions
Our work aimed to provide a comparative analysis of text representation techniques and classifiers while developing an IT ticket classification pipeline

Summary

Introduction

With today's increased digitization, any enterprise maintains a broad application portfolio, which is often grown historically. Such a portfolio must be supported by large scale complex IT service environments [1]. These developments reveal a fundamental role of IT support systems in any organization's support operations. While small companies still tend to perform these steps manually, large organizations dedicate large budgets to the implementation of various commercial text classification solutions. These approaches are sophisticated monolithic software focused on accuracy at the cost of explainability and understandability. We are guided by this important issue in the choice of text representation and classification techniques

Objectives

Methods

Results

Discussion

Conclusion