Abstract
With the growing internet, web spam is also increasing, which majorly affect the user experiences with search engines. Web spam methods target the search engine’s internal programs to push targeted web sites at the upper positions. This paper proposed an intelligent oversampling approach based upon general type-2 fuzzy sets to balance the distribution and hence enhance the classification performance for web spam detection. The proposed method is validated with the real-world benchmark dataset, WEBSPAM-UK 2007, and its performance is assessed with AUC (Area under the ROC curve), F-measure, and G-mean. It is compared with SMOTE in combination with 11 well-known base classifiers available with WEKA Tool. The computational complexity of the proposed method is the same as that of SMOTE. It is reported that when the proposed method is combined with the base classifiers, it boosts up the classifier’s performance and outperforms SMOTE in every case. Proposed combinations are also statistically analyzed using Friedman, Holm, and Wilcoxon test to know the best combination among the 11 base classifiers. It is evident from the analysis that the proposed method, in combination with random forest (GT2FS-SMOTE+RF), performed best among every other combination.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.