Abstract
In social media, the users share their ideas, opinions to their neighbours and friends. Spammers send spam information to the genuine users to mislead them. This spam data is a very serious problem in social media sites. To detect spam messages in social media various spam detection methodologies are developed by researchers. The researchers used more number of features to construct the models. Generally the original dataset contains many irrelevant and redundant features. Such large amount of features reduces the spam detection accuracy. To improve the spam detection accuracy in social media networks, we have to reduce the meaningless attributes from high dimensional social media dataset. In order to reduce dimensionality of dataset, we have used one of the dimensionality reduction approach, called principal component analysis (PCA). After reducing the dimensionality of the dataset, the dataset samples are classified using Decision Tree Induction classifier algorithm and K Nearest Neighbour algorithm. In our proposed work these algorithms are used to check data samples are spam samples or ham samples. In this methodology, we have used Twitter dataset for testing proposed approach. Experimental results shows that KNN classifier outperforms compared to Decision tree classifier.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have