Anomaly detection in netflow traffic: workflow for dataset preparation and analysis
Information and communication technology (ICT) is crucial for maintaining efficient communications, enhancing processes, and enabling digital transformation. As ICT becomes increasingly significant in our everyday lives, ensuring its security is crucial for maintaining digital trust and resilience against evolving cyber threats. These technologies generate a large amount of data that should be analyzed simultaneously to detect threats to an ICT system and protect the sensitive information it may contain. NetFlow is a network protocol that can be used to monitor network traffic, collect Internet Protocol (IP) addresses, and detect anomalies in NetFlow. The article follows the design science research (DSR) methodology to reach an objective of providing a methods for developing a set of features for NetFlow analysis with a machine learning. The sets of features were analyzed and validated by implementing anomaly detection with the K-means clustering algorithm and time-series forecasting using the long short-term memory (LSTM) method. The study provides two separate sets of features for both machine learning methods (24 features for clustering and 14 for LSTM), an overview of the anomaly detection methods used in this research and a method to combine both machine learning approaches. Furthermore, this study introduces a method that integrates the outputs of both models and evaluates the reliability of the final decision based on Bayes' theorem and previous performance of the models.
- Conference Article
- 10.1109/iccit55355.2022.10119090
- Nov 22, 2022
Covid-19 emerged as a pandemic outbreak that spread almost worldwide at the end of December 2019. While this research was carried out, the Covid-19 pandemic was still ongoing. Many countries have made various attempts to overcome Covid-19. In Indonesia, the government and stakeholders, including researchers, have made many activities to reduce the number of positive patients. One of many activities that the government made is the vaccination program. The vaccination program is believed to be the most effective in reducing the number of positive cases of Covid-19. But nobody knows when the Covid-19 pandemic will end. Stakeholder has to know how the trend of Covid-19 cases in Indonesia to make a better decision for facing Covid-19 cases. This study aims to predict the number of positive Covid-19 cases in Indonesia by conducting a comparative analysis performance of Support Vector Regression (SVR) method and Long Short-Term Memory (LSTM) method in machine learning to the prediction of the number of Covid-19 cases. This study was conducted using the dataset Covid-19 in Indonesia from Control Team from 13 January 2021 until 08 November 2021 and with 300 records. The evaluation has been conducted to know the performance of the model prediction number of Covid-19 with Support Vector Regression method and Long Short-Term Memory method based on values of R-Square (R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ), the value of Mean Absolute Error (MAE) and Mean Square Error (MSE). The research found that the method Support Vector Regression has better performance than Long Short-Term Memory method for making a prediction of the number Covid-19 using Machine Learning model based on the value of accuracy and error rate based with the value of R-Squared, MAE, and MSE are consecutively 0.902, 0.163, and 0.072.
- Research Article
75
- 10.1016/j.neucom.2022.09.003
- Sep 20, 2022
- Neurocomputing
A survey on machine learning models for financial time series forecasting
- Conference Article
1
- 10.1063/5.0112715
- Jan 1, 2023
This paper presents the systematic literature review of the application machine learning method in detecting a distributed of service (DDoS) attack. Several relevant research papers were selected and they were reviewed based on the method using to provide the best performances and evidence in machine learning technique applications. The researchers are dedicating their efforts to analyzing, summarizing, and evaluating various machine learning methods for detecting DDoS attacks. Therefore, the purpose of this study is to evaluate several machine learning approaches for detecting DDoS attacks in computer networks. These mechanisms are characterized into five categories, the Multiple Linear Regression method, Deep Neural Network (DNN) and Long Short-Term Memory (LSTM) method, Recurrent Neural Network (RNN) with Autoencoder, Deep learning-based method, and LSTM with Singular Value Decomposition (SVD). The paper also debates several open research questions and the research technique, parameters, and metrics. Also reviewed and contrasted were summaries of analyses and gaps in deploying a predictable machine learning model. Thus, the paper is expected to benefit academicians and researchers in developing an efficient solution for the machine learning mentioned above in detecting DDoS attacks.
- Research Article
- 10.65112/tcmis.10014
- Oct 15, 2025
- Transactions on Computational Modelling and Intelligent Systems
Malaria remains a major public health burden in Nigeria, where climatic variability plays a critical role in shaping transmission dynamics. This study develops and evaluates climate-based predictive models for malaria incidence by integrating historical malaria surveillance data (2018–2023) with key meteorological variables, temperature, precipitation, humidity, and wind speed, across diverse ecological zones. Both traditional statistical and advanced machine learning (ML) approaches were employed to capture linear and nonlinear relationships between climate factors and malaria occurrence. Multiple Linear Regression (MLR) served as the baseline model, while Random Forest (RF), Support Vector Regression (SVR), Artificial Neural Network (ANN), Gradient Boosting Regression (GBR), XGBoost, and Long Short-Term Memory (LSTM) networks represented ML alternatives. Model performance was assessed using RMSE, MAE, R², and MAPE. Results revealed that ensemble-based ML models significantly outperformed MLR, with XGBoost emerging as the best performer (R² = 0.89; RMSE = 27.9; MAPE = 9.8%), followed closely by GBR and RF. The LSTM model effectively captured temporal dependencies (R² = 0.83), while MLR exhibited limited predictive ability (R² = 0.61). Regional analyses indicated that prediction accuracy was higher in areas with stable climatic conditions and reliable data reporting, whereas variability and data gaps in conflict-affected zones reduced performance. The findings highlight the superior predictive power and adaptability of ensemble ML methods for climate-driven malaria forecasting. The study offers an evidence-based framework for integrating these models into Nigeria’s early warning systems, supporting timely and geographically targeted malaria control interventions.
- Research Article
3
- 10.1259/bjr.20220373
- Mar 6, 2023
- The British Journal of Radiology
A dose deposition matrix (DDM) prediction method using several voxel features and a machine learning (ML) approach is proposed for plan optimization in radiation therapy. Head and lung cases with the inhomogeneous medium are used as training and testing data. The prediction model is a cascade forward backprop neural network where the input is the features of the voxel, including 1) voxel to body surface distance along the beamlet axis, 2) voxel to beamlet axis distance, 3) voxel density, 4) heterogeneity corrected voxel to body surface distance, 5) heterogeneity corrected voxel to beamlet axis, and (6) the dose of voxel obtained from the pencil beam (PB) algorithm. The output is the predicted voxel dose corresponding to a beamlet. The predicted DDM was used for plan optimization (ML method) and compared with the dose of MC-based plan optimization (MC method) and the dose of pencil beam-based plan optimization (PB method). The mean absolute error (MAE) value was calculated for full volume relative to the dose of the MC method to evaluate the overall dose performance of the final plan. For patient with head tumor, the ML method achieves MAE value 0.49 × 10-4 and PB has MAE 1.86 × 10-4. For patient with lung tumor, the ML method has MAE 1.42 × 10-4 and PB has MAE 3.72 × 10-4. The maximum percentage difference in PTV dose coverage (D98) between ML and MC methods is no more than 1.2% for patient with head tumor, while the difference is larger than 10% using the PB method. For patient with lung tumor, the maximum percentage difference in PTV dose coverage (D98) between ML and MC methods is no more than 2.1%, while the difference is larger than 16% using the PB method. In this work, a reliable DDM prediction method is established for plan optimization by applying several voxel features and the ML approach. The results show that the ML method based on voxel features can obtain plans comparable to the MC method and is better than the PB method in achieving accurate dose to the patient, which is helpful for rapid plan optimization and accurate dose calculation. Establishment of a new machine learning method based on the relationship between the voxel and beamlet features for dose deposition matrix prediction in radiation therapy.
- Research Article
- 10.1149/ma2023-015460mtgabs
- Aug 28, 2023
- Electrochemical Society Meeting Abstracts
To better understand degradation in electrochemical converters and helping to correlate certain phenomena with specific operating conditions, machine learning (ML) methods are increasingly being applied. Success has already been achieved in the field of degradation analysis and prediction of capacity of lithium ion batteries1, for instance. In terms of Solid Oxide Cell (SOC) stacks ML methods have been applied mainly with the aim of identification of faulty operation modes and degradation related fault diagnosis2. ML approaches usually require a considerable amount of real training data, when used for forecasting models. A data consolidation and curation strategy was developed with the aim of processing the historic long-term test bench data of SOCs collected by Forschungszentrum Jülich over the past years. In comparison to other datasets developed in this field3, the one presented in this work contains SOC stack tests in fuel cell operation with significantly longer operating times under load. A compilation of the sample experiments and the consolidation into a hierarchical data format are presented. Further, an essential part of the strategy is the automatic curation and analysis of electrochemical impedance spectroscopy (EIS) measurements, using a specifically developed procedure in Python. The varying quality of measurements from past years, as well as recurring artefacts such as parasitic inductances, can be addressed in this way. Additional distribution of relaxation times (DRT) deconvolutions and equivalent circuit modelling (ECM) are performed, as part of the procedure to automatically retrieve feature values from measurements (cf. Fig. 1). The novel dataset, which to the authors’ knowledge includes some of the longest SOC stack tests available, serves as the basis for several evaluations. In addition to classification and clustering work to derive degradation patterns, in particular based on the EIS data, another focus is on the development of forecasting models. The current work is primarily concerned with long short-term memory (LSTM), as well as regression models that make use of both the time series data and the characterisation measurements, such as EIS. Acknowledgement The authors would like to thank their colleagues at Forschungszentrum Jülich GmbH for their great support and the Helmholtz Society as well as the German Federal Ministry of Education and Research for financing these activities as part of the WirLebenSOFC project (03SF0622B). References 1: Jones, P.K., Stimming, U. & Lee, A.A. Impedance-based forecasting of lithium-ion battery performance amid uneven usage. Nature Communications 13, 4806 (2022).2: B. Yang et al. Solid oxide fuel cell systems fault diagnosis: Critical summarization, classification, and perspectives. Journal of Energy Storage 34, 102153 (2021).3: A.K. Padinjarethil, S. Pollok & A. Hagen. Degradation studies using machine learning on novel solid oxide cell database. Fuel Cells 21, 566–576 (2021). Figure caption: Fig.1: Flow diagram of EIS data curation pipeline and curation results for example EIS measurement. Figure 1
- Research Article
4
- 10.22146/ijccs.60733
- Oct 31, 2020
- IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
Research on sentiment analysis in recent years has increased. However, in sentiment analysis research there are still few ideas about the handling of negation, one of which is in the Indonesian sentence. This results in sentences that contain elements of the word negation have not found the exact polarity.The purpose of this research is to analyze the effect of the negation word in Indonesian. Based on positive, neutral and negative classes, using attention-based Long Short Term Memory and word2vec feature extraction method with continuous bag-of-word (CBOW) architecture. The dataset used is data from Twitter. Model performance is seen in the accuracy value.The use of word2vec with CBOW architecture and the addition of layer attention to the Long Short Term Memory (LSTM) and Bidirectional Long Short Term Memory (BiLSTM) methods obtained an accuracy of 78.16% and for BiLSTM resulted in an accuracy of 79.68%. whereas in the FSW algorithm is 73.50% and FWL 73.79%. It can be concluded that attention based BiLSTM has the highest accuracy, but the addition of layer attention in the Long Short Term Memory method is not too significant for negation handling. because the addition of the attention layer cannot determine the words that you want to pay attention to.
- Research Article
- 10.47974/jios-1558
- Jan 1, 2024
- Journal of Information and Optimization Sciences
The expansion of online transactions, particularly online credit card transactions, has revolutionized the field of e-commerce and streamlined electronic payment systems. However, this growth has also given rise to a significant challenge in the form of credit card fraud. To combat this issue, banks and financial organizations have recognized the need for robust credit card fraud detection applications. Machine learning (ML) approaches have emerged as a valuable tool in this regard, as they offer the potential to accurately detect and prevent fraudulent transactions. Long Short-Term Memory (LSTM), a recurrent neural network, is used in this study’s evaluation of ML approaches for detecting credit card fraud (CCFD) in online transactions. The most effective LSTM architecture is chosen after a thorough examination based on its capacity to identify credit card fraud with high accuracy and precision. The suggested method makes use of LSTM and RFM analysis to comprehend customer behavior and ADASYN sampling to address class imbalance. The findings show that the selected LSTM architecture, in combination with RFM analysis and ADASYN, delivers great efficiency and efficacy in identifying credit card fraud, hence promoting safe online transactions.
- Research Article
6
- 10.57197/jdr-2023-0021
- Aug 1, 2023
- Journal of Disability Research
Anomaly detection in pedestrian walkways of visually impaired people (VIP) is a vital research area that utilizes remote sensing and aids to optimize pedestrian traffic and improve flow. Researchers and engineers can formulate effective tools and methods with the power of machine learning (ML) and computer vision (CV) to identifying anomalies (i.e. vehicles) and mitigate potential safety hazards in pedestrian walkways. With recent advancements in ML and deep learning (DL) areas, authors have found that the image recognition problem ought to be devised as a two-class classification problem. Therefore, this manuscript presents a new sine cosine algorithm with deep learning-based anomaly detection in pedestrian walkways (SCADL-ADPW) algorithm. The proposed SCADL-ADPW technique identifies the presence of anomalies in the pedestrian walkways on remote sensing images. The SCADL-ADPW techniques focus on the identification and classification of anomalies, i.e. vehicles in the pedestrian walkways of VIP. To accomplish this, the SCADL-ADPW technique uses the VGG-16 model for feature vector generation. In addition, the SCA approach is designed for the optimal hyperparameter tuning process. For anomaly detection, the long short-term memory (LSTM) method can be exploited. The experimental results of the SCADL-ADPW technique are studied on the UCSD anomaly detection dataset. The comparative outcomes stated the improved anomaly detection results of the SCADL-ADPW technique.
- Research Article
6
- 10.1108/tqm-01-2023-0017
- Oct 3, 2023
- The TQM Journal
PurposeDesign science research (DSR) is a structured approach for solving complex ill-structured problems in organizations through the development of an artefact followed by its validation. This paper aims to evaluate existing DSR methodology and propose specific accents to promote DSR for environmental, social and governance (ESG)-oriented operational excellence (OPEX) initiatives within organizations.Design/methodology/approachThis commentary paper is based on an abductive reasoning approach to evaluate and understand DSR and assess its effectiveness for developing solutions to typical ESG-oriented OPEX-based problems within organizations.FindingsExisting literature on DSR is reviewed, after which it is evaluated on its ability to contribute to the implementation of sustainable solutions for ESG-oriented OPEX-based problems. Based on the review, specific DSR methodological accents are proposed for the development of ESG-oriented OPEX-based solutions in organizations.Research limitations/implicationsThis conceptual paper contributes to the conceptual understanding of the applicability, limitations and contextual preconditions for applying DSR. This paper proposes an explicit and, in some ways, alternative view on DSR research for OPEX researchers to apply and further the body of knowledge on matters of sustainability (ESG) in operations management.Practical implicationsCurrently, there is limited understanding and application of the DSR methodology for OPEX-based problem-solving initiatives, as appears in the scant literature on DSR applied for the implementation of OPEX based initiatives for ESG purposes. This paper aims to challenge and provide accents for DSR applied to OPEX-related problems by means of a DSR framework and thereby promotes intervention-based studies among researchers.Originality/valueThe proposed step-by-step methodology contains novel elements and is expected to be of help for OPEX-oriented academicians and practitioners in implementing DSR methodology for practical related problems which need research interventions from academics from Higher Education Institutions.
- Research Article
55
- 10.1016/j.heliyon.2021.e08143
- Oct 1, 2021
- Heliyon
COVID-19 has produced a global pandemic affecting all over of the world. Prediction of the rate of COVID-19 spread and modeling of its course have critical impact on both health system and policy makers. Indeed, policy making depends on judgments formed by the prediction models to propose new strategies and to measure the efficiency of the imposed policies. Based on the nonlinear and complex nature of this disorder and difficulties in estimation of virus transmission features using traditional epidemic models, artificial intelligence methods have been applied for prediction of its spread. Based on the importance of machine and deep learning approaches in the estimation of COVID-19 spreading trend, in the present study, we review studies which used these strategies to predict the number of new cases of COVID-19. Adaptive neuro-fuzzy inference system, long short-term memory, recurrent neural network and multilayer perceptron are among the mostly used strategies in this regard. We compared the performance of several machine learning methods in prediction of COVID-19 spread. Root means squared error (RMSE), mean absolute error (MAE), R2 coefficient of determination (R2), and mean absolute percentage error (MAPE) parameters were selected as performance measures for comparison of the accuracy of models. R2 values have ranged from 0.64 to 1 for artificial neural network (ANN) and Bidirectional long short-term memory (LSTM), respectively. Adaptive neuro-fuzzy inference system (ANFIS), Autoregressive Integrated Moving Average (ARIMA) and Multilayer perceptron (MLP) have also have R2 values near 1. ARIMA and LSTM had the highest MAPE values. Collectively, these models are capable of identification of learning parameters that affect dissimilarities in COVID-19 spread across various regions or populations, combining numerous intervention methods and implementing what-if scenarios by integrating data from diseases having analogous trends with COVID-19. Therefore, application of these methods would help in precise policy making to design the most appropriate interventions and avoid non-efficient restrictions.
- Research Article
- 10.24135/rangahau-aranga.v4i1.263
- Mar 26, 2025
- Rangahau Aranga: AUT Graduate Review
In recent years, the combination of radar sensors and machine learning has transformed vital sign monitoring, especially in the healthcare and automobile industries. This study uses mmWave radar technology in vehicles to monitor vital signs, which addresses issues such as driver weariness. When integrated with machine learning, the technology provides non-invasive, privacy-preserving physiological monitoring solutions in settings such as patient care facilities and vehicle cabins, while still performing efficiently in demanding environments. Machine learning improves the accuracy of radar-based monitoring by processing vast amounts of sensor data, but maintaining precision in noisy situations such as vehicles is difficult. This study addresses these issues by correctly monitoring both drivers and passengers (Ahmed & Cho, 2024).This presentation discusses hardware restrictions, implemented solutions, and current software concerns related to vital sign acquisition. Techniques like Gaussian noise addition and Generative Adversarial Networks (GANs) can enhance the accuracy and reliability of collected datasets. Autoencoders are preferred over traditional filtering methods like Kalman filters, as they can effectively solve non-linear problems and remove noise and background. Machine learning approaches such as Convolutional Neural Networks (CNNs) and self-calibrated Long Short-Term Memory (LSTM) are found to be more effective for feature extraction in diverse environmental conditions (Zheng et al., 2021). Traditional autoregressive models are noise-sensitive, so machine learning methods like Temporal Convolutional Networks (TCNs) are recommended for signal processing, real-time vital sign recording, and reconstructing heart rate variability without connected sensors. Cutting-edge hardware solutions like radars and graphical processing machines, such as Jetson Nano, are utilized by the research team to address the challenges of real-time machine learning (Zhang et al., 2022).
- Research Article
82
- 10.1016/j.ocemod.2021.101832
- Jun 8, 2021
- Ocean Modelling
Predicting Lake Erie wave heights and periods using XGBoost and LSTM
- Dissertation
- 10.53846/goediss-6872
- Feb 21, 2022
Context- and Physiology-aware Machine Learning for Upper-Limb Myocontrol
- Research Article
32
- 10.1016/j.ecoinf.2023.102253
- Aug 9, 2023
- Ecological Informatics
A novel hybrid machine learning model for prediction of CO2 using socio-economic and energy attributes for climate change monitoring and mitigation policies
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.