Abstract

Purpose This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile classification. The method is demonstrated through the problem of sick-leave promoters on Twitter. Design/methodology/approach Four machine learning classifiers were used on a total of 35,578 tweets posted on Twitter. The data were manually labeled into two categories: promoter and nonpromoter. Classification performance was compared when the proposed clustering feature selection approach and the standard feature selection were applied. Findings Radom forest achieved the highest accuracy of 95.91% higher than similar work compared. Furthermore, using clustering as a feature selection method improved the Sensitivity of the model from 73.83% to 98.79%. Sensitivity (recall) is the most important measure of classifier performance when detecting promoters’ accounts that have spam-like behavior. Research limitations/implications The method applied is novel, more testing is needed in other datasets before generalizing its results. Practical implications The model applied can be used by Saudi authorities to report on the accounts that sell sick-leaves online. Originality/value The research is proposing a new way textual clustering can be used in feature selection.

Highlights

  • When not reporting to work, employees are expected to present proof if they claim to have had a medical condition

  • Sick leaves are documents provided by medical facilities issued by a doctor certifying that the person is suffering from a condition that allows them days off

  • In Saudi Arabia, employee absenteeism has been an issue for some time

Read more

Summary

Introduction

When not reporting to work, employees are expected to present proof if they claim to have had a medical condition. Sick leaves are documents provided by medical facilities issued by a doctor certifying that the person is suffering from a condition that allows them days off. Some employees and students abuse this allowance and issue documents illegally to have free day(s) off. The government is combating the issuance of these documents by designing laws and regulations [1]. Despite these efforts, this type of documents is still being circulated. Promoters are accounts that sell these documents on social media. Since it is illegal, most of the accounts are either fake or pseudo accounts. Spammers tend to repeat the exact text multiple times within a short period of time [1]

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call