Abstract

Machine Learning (ML) research greatly helps in predicting model-based outcomes with high levels of accuracy based upon the training and testing of the models through the datasets. The social networks constitute one of the domains where ML can be used effectively to ensure the authenticity and security of the valid users. With the increase in usage of Online Social Networks (OSNs), the cases of spam and malicious activities can be found in abundance and Sybil nodes pose one such kind of safety and security hazard. Sybil account detection is not an easy task since they mimic the actual behavior of human accounts up to a great extent. In this paper, we look at one such scenario of Sybil accounts on the OSN, Twitter where machine leaning models have been used to train the machine with the existing datasets so as to be able to detect these malicious users before they can bring harm to the normal communication of the genuine users. Since the datasets used are so vast, the process of feature selection has been carried on the datasets as part of pre-processing before the actual classification as it assists in enhancing the model performance. Support Vector Machine–Recursive Feature Elimination (SVM-RFE) and Logistic Regression–Recursive Feature Elimination (LR-RFE) techniques have been used in this study for the selection of significant features. The classification model is trained on the selected features using Random Forest (RF) and K-Nearest Neighbor (KNN) algorithms. We also analyzed the biasing effects of fake accounts on the human accounts datasets during the process of features selection and classification. It has been shown that the RF algorithm outperformed KNN on the feature sets selected through SVM-RFE and LR-RFE.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.