Abstract
Research into the use of machine learning techniques for network intrusion detection, especially carried out with respect to the popular public dataset, KDD cup 99, have become commonplace during the past decade. The recent popularity of cloud-based computing and the realization of the associated risks are the main reasons for this research thrust. The proposed research demonstrates that machine learning algorithms can be effectively used to enhance the performance of existing intrusion detection systems despite the high misclassification rates reported in the literature. This paper reports on an empirical investigation to determine the underlying causes of the poor performance of some of the well-known machine learning classifiers. Especially when learning from minor classes/attacks. The main factor is that the KDD cup 99 dataset, which is popularly used in most of the existing research, is an imbalanced dataset due to the nature of the specific intrusion detection domain, i.e. some attacks being rare and some being very frequent. Therefore, there is a significant imbalance amongst the classes in the dataset. Based on the number of the classes in the dataset, the imbalance dataset issue can be considered a binary problem or a multi-class problem. Most of the researchers focus on conducting a binary class classification as conducting a multi-class classification is complex. In the research proposed in this paper, we consider the problem as a multi-class classification task. The paper investigates the use of different machine learning algorithms in order to overcome the common misclassification problems that have been faced by researchers who used the imbalance KDD cup 99 dataset for their investigations. Recommendations are made as for which classifier is best for the classification of imbalanced data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.