Advanced Persistent Threats (APTs) pose considerable challenges in the realm of cybersecurity, characterized by their evolving tactics and complex evasion techniques. These characteristics often outsmart traditional security measures and necessitate the development of more sophisticated detection methods. This study introduces Feature Evolution using Genetic Programming (FEGP), a novel method that leverages multi-tree Genetic Programming (GP) to construct and enhance features for APT detection. While GP has been widely utilized for tackling various problems in different domains, our study focuses on the adaptation of GP to the multifaceted landscape of APT detection. The proposed method automatically constructs discriminative features by combining the original features using mathematical operators. By leveraging GP, the system adapts to the evolving tactics employed by APTs, enhancing the identification of APT activities with greater accuracy and reliability. To assess the efficacy of the proposed method, comprehensive experiments were conducted on widely used and publicly accessible APT datasets. Using the combination of constructed and original features on the DAPT-2020 dataset, FEGP achieved a balanced accuracy of 79.28%, surpassing the best comparative methods by an average of 2.12% in detecting APT stages. Additionally, utilizing only constructed features on the Unraveled dataset, FEGP achieved a balanced accuracy of 83.14%, demonstrating a 3.73% improvement over the best comparative method. The findings presented in this paper underscore the importance of GP-based feature construction for APT detection, providing a pathway toward improved accuracy and efficiency in identifying APT activities. The comparative analysis of the proposed method against existing feature construction methods demonstrates FEGP’s effectiveness as a state-of-the-art method for multi-class APT classification. In addition to the performance evaluation, further analysis was conducted, encompassing feature importance analysis, and a detailed time analysis.
Read full abstract