Abstract
Anomaly detection is one solution to overcome the issue of data network traffic security, but is faced with the challenge of high data dimensionality and imbalanced data. High-dimensional and imbalanced data can affect the performance of the detection system. Therefore we need a feature selection technique that can reduce the dimensionality of the data by eliminating irrelevant features. In addition, the selected features need to be validated with the right classification algorithm to produce high anomaly detection performance. The purpose of this study is to produce a combination of feature selection techniques and appropriate classification algorithms to produce a system that is able to detect attacks on high-dimensional and imbalanced data. Chi-square feature selection technique was used to eliminate irrelevant features. To determine the ideal classification algorithm, in this study, a comparison of the performance of the tree-based classifer algorithm was carried out. This study also examines the performance of classification techniques in detecting traffic on high-dimensional and unbalanced data. Several Tree-based classification algorithms such as REPTree, J48, Random Tree and Random Forest were tested and compared. Testing with the best performance as a recommendation for the ideal combination of feature selection techniques and classification algorithms. This research produces an anomaly detection system that has high performance. For experimental data, the CICIDS-2017 dataset is used, which has high data dimensionality and contains unbalanced data. The test results show that Random Tree has an accuracy of 99.983% and Random Forest 99.984%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.