This investigation underscores the paramount imperative of discerning network intrusions as a pivotal measure to fortify digital systems and shield sensitive data from unauthorized access, manipulation, and potential compromise. The principal aim of this study is to leverage a publicly available dataset, employing a Genetic Programming Symbolic Classifier (GPSC) to derive symbolic expressions (SEs) endowed with the capacity for exceedingly precise network intrusion detection. In order to augment the classification precision of the SEs, a pioneering Random Hyperparameter Value Search (RHVS) methodology was conceptualized and implemented to discern the optimal combination of GPSC hyperparameter values. The GPSC underwent training via a robust five-fold cross-validation regimen, mitigating class imbalances within the initial dataset through the application of diverse oversampling techniques, thereby engendering balanced dataset iterations. Subsequent to the acquisition of SEs, the identification of the optimal set ensued, predicated upon metrics inclusive of accuracy, area under the receiver operating characteristics curve, precision, recall, and F1-score. The selected SEs were subsequently subjected to rigorous testing on the original imbalanced dataset. The empirical findings of this research underscore the efficacy of the proposed methodology, with the derived symbolic expressions attaining an impressive classification accuracy of 0.9945. If the accuracy achieved in this research is compared to the average state-of-the-art accuracy, the accuracy obtained in this research represents the improvement of approximately 3.78%. In summation, this investigation contributes salient insights into the efficacious deployment of GPSC and RHVS for the meticulous detection of network intrusions, thereby accentuating the potential for the establishment of resilient cybersecurity defenses.
Read full abstract