Abstract

Social media has become a mainstream tool for social engagement, entertainment, and even news sharing. As a result, many people are subscribed to one or multiple social media platforms. Like all social platforms with diverse members, there is need for clean conversation devoid of bullying, profanity and related social ills. Social media platforms require scalable, robust and accurate techniques for the detection of profane words in order to minimize their use and publication. Despite efforts being made to develop techniques for profane word detection, poor accuracy remains a challenge. Therefore, to address the challenge, this paper develops a data-driven approach for algorithm selection, herein called the Data-driven Ensemble Model (DEM). The approach consists of algorithm tuning, extreme feature engineering, crowd-sourcing analytics, and ensemble method selection. Raw data from Twitter social media platform was used to censor profane words and WEKA data mining software was used to conduct evaluation experiments. The experiments covered data preprocessing; crowd-sourcing evaluation; ensemble method selection; algorithm tuning, and model building and performance evaluation. The results of these evaluation experiments showed the Data-driven Ensemble Model (DEM) had a better average accuracy of 94.94% compared to baseline method Support Vector Machines (SVM) of 93.33 % on three profane words datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.