The main focus of this project is to design and implement an IDS for SDNs that are currently causing great interest in the networking society. Making the control plane and the data plane modularity, SDNs provide a higher level of flexibility and programming capabilities to manage the network but concurrently budding novel security risks. Several machine learning algorithms are in use such as the Logistic Regression, Decision Tree, Random Forest, etc. , which are used to identify malice within the network. As for the dataset sdn_intrusion. csv, preprocessing steps included were cleaning where missing values were managed, categorical data was encoded as well as handling for outliers existed. This paper applied feature selection by analysing correlation between the input features and the target variable with the aim of enhancing the model accuracy. The models were assessed with cross validation and several metrics like accuracy, Precision, Recall, F1 score, AUC and the results are quantitatively and qualitatively presented in the form of confusion matrices, ROC and Precision Recall curves. As the findings indicate, Extended Trees (Extra Trees Classifier) and Histograms Gradient Boosting models are efficient in identifying intrusions within SDN architectures. In conclusion, the study highlights the necessity of the proper choice of the machine learning algorithm and preliminary data processing for the effective IDS construction in SDNs, which enhances the readiness level of the networks against cyber threats. KEYWORDS: Software Defined Networks, Intrusion Detection System, Machine Learning, Network Security, Logistic Regression, Extra Trees Classifier, Random Forest, LightGBM, Precision-Recall Curve, ROC Curve.
Read full abstract