Abstract

Text categorization is a supervised learning task which aims to assign labels to documents based on the predicted outcome suggested by a classifier trained on a set of labelled documents. The association of text classification to facilitate labelling reports/complaints in the economic and health related fields can have a tremendous impact in the speed at which these are processed, and therefore, lowering the required time to act upon these complaints and reports. In this work, we aim to classify complaints into the main 4 economic activities given by the Portuguese Economic and Food Safety Authority. We evaluate the classification performance of 9 algorithms (Complement Naive Bayes, Bernoulli Naive Bayes, Multinomial Naive Bayes, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, AdaBoost and Logistic Regression) at different layers of text preprocessing. Results reveal high levels of accuracy, roughly around 85%. It was also observed that the linear classifiers (support vector machine and logistic regression) allowed us to obtain higher f1-measure values than the other classifiers in addition to the high accuracy values revealed. It was possible to conclude that the use of these algorithms is more adequate for the data selected, and that applying text classification methods can facilitate and help the complaints and reports processing which, in turn, leads to a swifter action by authorities in charge. Thus, relying on text classification of reports and complaints can have a positive influence in either economic crime prevention or in public health, in this case, by means of food-related inspections.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.