Abstract
PurposeTimely intrusion detection in extensive traffic remains a pressing and complex challenge, including for Web services. Current research emphasizes improving detection accuracy through machine learning, with scant attention paid to the dataset’s impact on the capability for fast detection. Many datasets rely on flow-level features, requiring entire flow completion before determining if it constitutes an attack, reducing efficiency. This paper aims to introduce a new feature extraction method and construct a new security dataset that enhances detection efficiency.Design/methodology/approachThis paper proposes a novel partial-flow feature extraction method that extracts packet-level features efficiently to reduce the high latency of flow-level extraction. The method also integrates statistical and temporal features derived from partial flows to improve accuracy. The method was applied to the original packet capture (PCAP) files utilized in creating the CSE-CIC-IDS 2018 dataset, resulting in the development of the WKLIN-WEB-2023 dataset specifically designed for web intrusion detection. The effectiveness of this method was evaluated by training nine classification models on both the WKLIN-WEB-2023 and CSE-CIC-IDS 2018 datasets.FindingsThe experimental results show that models trained on the WKLIN-WEB-2023 dataset consistently outperform those on the CSE-CIC-IDS 2018 dataset across precision, recall, f1-score, and detection latency. This demonstrates the superior effectiveness of the new dataset in enhancing both the efficiency and accuracy of intrusion detection.Originality/valueThis study proposes the partial-flow feature extraction method, creating the WKLIN-WEB-2023 dataset. This novel approach significantly enhances detection efficiency while maintaining classification performance, providing a valuable foundation for further research on intrusion detection efficiency.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have