Abstract

A deliberate falsehood intentionally fabricated to appear as the truth, or often called as hoax (hocus to trick) has been increasing at an alarming rate. This situation may cause restlessness/anxiety and panic in society. Even though hoaxes have no effect on threats, however, new perceptions can be spread that they can affect both the social and political conditions. Imagery blown from hoaxes can bring negative effects and intervene state policies that may decrease the economy. An early detection on hoaxes helps the Government to reduce and even eliminate the spread. There are some system that filter hoaxes based on title and also from voting processes from searching processes in a search engine. This research develops Indonesian hoax filter based on text vector representation based on Term Frequency and Document Frequency as well as classification techniques. There are several classification techniques and for this research, Support Vector Machine and Stochastic Gradient Descent are chosen. Support Vector Machine divides a word vector using linear function and Stochastic Gradient Descent divides a word vector using nonlinear function. SVM and SGD are chosen because the characteristic of text classification includes multidimensional matrixes. Each word in news articles can be modeled as feature and with Linear SVC and SGD, the feature of word vector can be reduced into two dimensions and can be separated using linear and non-linear lines. The highest accuracy obtained from SGD classifier using modifled-huber is 86% over 100 hoax and 100 nonhoax websites which are randomly chosen outside dataset which are used in the training process.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.