Abstract

The advancement in network technology has led to an exponential rise in the number of internet users across the globe. The increase in internet usage has resulted in an increase in both the number of malicious websites and cybercrimes reported over the years. Therefore, it has become critical to devise an intelligent solution that can detect malicious websites and be used in real-time systems. In our paper, we perform a comparative analysis of various feature selection techniques to build a time-efficient and accurate predictive model. To build our predictive model, a set of features are selected by feature selection methods. The selected features consist of at least 70% of the categorical features in all feature selection techniques examined in this paper. Keeping the end goal of real-time deployment of models in context the cost of processing or storing these features is far cheaper when compared to text or image-based features. We started out with a class imbalance in our data which was later dealt with using the Synthetic Minority Oversampling Technique. Our proposed model also bested the existing work in the literature when compared over various evaluation metrics. The result indicated that Embedded feature selection was the best technique considering the accuracy of the model. The Filter-based technique may also be used in the context of developing a low latency system at the cost of the accuracy of the model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.