An FCM-based hybrid method for DDoS attack detection in resource-constrained devices
Abstract Smart interconnected devices belonging to the Internet of Things ecosystem are resource-constrained in terms of hardware and software. They are also prime attack targets for malicious parties. Although there has been an extensive exploration of attack detection methods rooted in machine learning, such approaches necessitate high processing overhead, which is ill-suited for devices of modest processing capabilities. Furthermore, machine learning algorithms are opaque black boxes. Therefore, we present a novel hybrid approach to detect distributed denial-of-service attacks using fuzzy cognitive maps paired with machine learning feature selection. Our approach incorporates contextual information (features) drawn from network packets. We utilize feature selection methods to compute the weights of the features. The weights capture the influence of each input feature on the target output feature that determines the classification of any packet as malicious or benign. The features and weights are used to construct a fuzzy cognitive map for each type of attack. The fuzzy cognitive map is then used to train and test the dataset. We also auto-compute a threshold value that allows our model to classify a packet as malicious or benign. Our model performs best using the weights computed by two particular statistical feature selection algorithms, namely, SelectKBest-Classification and SelectKBest Chi-squared, combined with FCM. Our experiments show that this hybrid approach is simple, reliable, and transparent with a low memory footprint, and therefore well-suited for devices with limited resources.
- Book Chapter
2
- 10.5772/9153
- Feb 1, 2010
In this chapter we discussed two important problems in the pre-processing step of many supervised learning tasks. A list of well-known algorithms were presented and discussed. A new framework was proposed, extending the concept proposed by the authors in a previous work. This framework was validated by some simulations using the metaheuristic Simulated Annealing and NSGA-II. These simulations show that although the quality of solutions generated by this framework is quite similar to those obtained by sequential executions, this approach reaches the better solutions faster than the other approaches. The frameworks is based on what we called "power of influence", i.e. the quality of features in a given supervised learning task is intrinsically related to the quality of instances used in this task, and vice-versa. Based on this we created the framework that work with two separated wrappers for these two problems, jointing them in a single evaluation procedure. 5.1 Future Work - The Framework for Multi-Objective Feature and Instance Selection An important characteristic we want to add to this framework in the future is the possibility to handle the multi-objective versions of the two selection problems. The usage of multi objectives brings new power but also new problems to the search processes. In these formulations, the characteristic of total ordering is replaced by partial ordering, using the concept of Pareto optimality. The ideas of better and worse are replaced by dominance, nondominance. Given two solutions a, b and a set of functions F to be minimized (or maximized, but in this explanation we suppose they are to be minimized), we say that a weakly dominates b if and only if
- Research Article
- 10.1093/eurheartj/ehab724.3069
- Oct 12, 2021
- European Heart Journal
Background Thrombolysis in Myocardial infarction (TIMI) is used in predicting the mortality rate of the acute coronary syndrome (ACS) patients. TIMI was developed based on the Western cohort with limited data on the Asian cohort. There are separate TIMI scores for STEMI and NSTEMI. Deep learning (DL) and machine learning (ML) algorithms such as support vector machine (SVM) in population-specific dataset resulted in a higher area under the curve (AUC) to TIMI. The limitation of DL is selected features by the algorithm is unknown compared to ML algorithms. Purpose To construct a single in-hospital mortality risk scoring system that combines SVM feature importance and the DL algorithm in ASIAN patients with ACS that is applicable for both STEMI and NSTEMI patients. To investigate DL performance constructed using predictors selected from SVM feature extraction and DL using complete features and compare with TIMI risk score for STEMI and NSTEMI patients. Methods We constructed four algorithms: i) DL and SVM algorithm with feature selected from SVM variable importance, ii) DL and SVM algorithm without feature selection. SVM feature importance with the backward elimination method is used to select and rank important variables. We used registry data from the National Cardiovascular Disease Database of 13190 patient's data. Fifty-four parameters including demographics, cardiovascular risk, medications and clinical variables were considered. AUC was used as the performance evaluation metric. All algorithms were validated using validation dataset and compared to the conventional TIMI for STEMI and NSTEMI. Results Validation results in Figure 1 are by STEMI and NTEMI patients. Both DL algorithms outperformed ML and TIMI score on validation data. Similar performance is observed for DL and SVM algorithms using all predictors (54 predictors) with DL and SVM algorithm using selected predictors (14 predictors). Predictors selected by the SVM feature selection are: age, heart rate, Killip class, fasting blood glucose, ST-elevation, CABG, cardiac catheterization, angina episode, HDLC, LDC, other lipid-lowering agents, statin, anti-arrhythmic agent, oralhypogly. CABG and pharmacotherapy drugs as selected predictors improve mortality prediction compared to TIMI score. In DL, 25.87% of STEMI patients and 19.71% of NSTEMI patients are estimated as high risk (risk probabilities of >50%). TIMI underestimated the risk of mortality of high-risk patients (≥5 risk scores) with 13.08% from STEMI patients and 4.65% from NSTEMI patients (Figure 2). Conclusions In the ASIAN multi-ethnicity population, patients with ACS can be better classified using one single algorithm compared to the conventional method like TIMI which requires two different scores. Combining ML feature selection with DL allows the identification of distinct factors related to in-hospital mortality of ACS patients in a unique ASIAN population for better mortality prediction. Funding Acknowledgement Type of funding sources: Public grant(s) – National budget only. Main funding source(s): Technology Development Fund 1 Figure 1. Performance resultsFigure 2. Analysis on the validation set
- Research Article
8
- 10.1038/s41598-022-18839-9
- Oct 20, 2022
- Scientific Reports
Limited research has been conducted in Asian elderly patients (aged 65 years and above) for in-hospital mortality prediction after an ST-segment elevation myocardial infarction (STEMI) using Deep Learning (DL) and Machine Learning (ML). We used DL and ML to predict in-hospital mortality in Asian elderly STEMI patients and compared it to a conventional risk score for myocardial infraction outcomes. Malaysia's National Cardiovascular Disease Registry comprises an ethnically diverse Asian elderly population (3991 patients). 50 variables helped in establishing the in-hospital death prediction model. The TIMI score was used to predict mortality using DL and feature selection methods from ML algorithms. The main performance metric was the area under the receiver operating characteristic curve (AUC). The DL and ML model constructed using ML feature selection outperforms the conventional risk scoring score, TIMI (AUC 0.75). DL built from ML features (AUC ranging from 0.93 to 0.95) outscored DL built from all features (AUC 0.93). The TIMI score underestimates mortality in the elderly. TIMI predicts 18.4% higher mortality than the DL algorithm (44.7%). All ML feature selection algorithms identify age, fasting blood glucose, heart rate, Killip class, oral hypoglycemic agent, systolic blood pressure, and total cholesterol as common predictors of mortality in the elderly. In a multi-ethnic population, DL outperformed the TIMI risk score in classifying elderly STEMI patients. ML improves death prediction by identifying separate characteristics in older Asian populations. Continuous testing and validation will improve future risk classification, management, and results.
- Research Article
12
- 10.1093/rheumatology/keac032
- Jan 30, 2022
- Rheumatology
To develop a hypothesis-free model that best predicts response to MTX drug in RA patients utilizing biologically meaningful genetic feature selection of potentially functional single nucleotide polymorphisms (pfSNPs) through robust machine learning (ML) feature selection methods. MTX-treated RA patients with known response were divided in a 4:1 ratio into training and test sets. From the patients' exomes, potential features for classifier prediction were identified from pfSNPs and non-genetic factors through ML using recursive feature elimination with cross-validation incorporating the random forest classifier. Feature selection was repeated on random subsets of the training cohort, and consensus features were assembled into the final feature set. This feature set was evaluated for predictive potential using six ML classifiers, first by cross-validation within the training set, and finally by analysing its performance with the unseen test set. The final feature set contains 56 pfSNPs and five non-genetic factors. The majority of these pfSNPs are located in pathways related to RA pathogenesis or MTX action and are predicted to modulate gene expression. When used for training in six ML classifiers, performance was good in both the training set (area under the curve: 0.855-0.916; sensitivity: 0.715-0.892; and specificity: 0.733-0.862) and the unseen test set (area under the curve: 0.751-0.826; sensitivity: 0.581-0.839; and specificity: 0.641-0.923). Sensitive and specific predictors of MTX response in RA patients were identified in this study through a novel strategy combining biologically meaningful and machine learning feature selection and training. These predictors may facilitate better treatment decision-making in RA management.
- Research Article
3
- 10.11591/ijeecs.v35.i1.pp354-365
- Jul 1, 2024
- Indonesian Journal of Electrical Engineering and Computer Science
Machine learning (ML) techniques empower computers to learn from data and make predictions or decisions in various domains, while preprocessing methods assist in cleaning and transforming data before it can be effectively utilized by ML. Feature selection in ML is a critical process that significantly influences the performance and effectiveness of models. By carefully choosing the most relevant and informative attributes from the dataset, feature selection enhances model accuracy, reduces overfitting, and minimizes computational complexity. In this study, we leverage the UAH-DriveSet dataset to classify driver behavior, employing Filter, embedded, and wrapper methods encompassing 10 distinct feature selection techniques. Through the utilization of diverse ML algorithms, we effectively categorize driver behavior into normal, drowsy, and aggressive classes. The second objective is to employ feature selection techniques to pinpoint the most influential features impacting driver behavior. As a results, random forest emerges as the top-performing classifier, achieving an impressive accuracy of 96.4% and an F1-score of 96.36% using backward feature selection in 7.43 s, while K-nearest neighbour (K-NN) attains an accuracy of 96.29% with forward feature selection in 0.05 s. Following our comprehensive results, we deduce that the primary influential features for studying driver behavior include speed (km/h), course, yaw, impact time, road width, distance to the ahead vehicle, vehicle position, and number of detected vehicles.
- Research Article
3
- 10.1038/s41598-024-84879-y
- Apr 16, 2025
- Scientific Reports
Distributed Denial-of-Service (DDoS) attacks have become a critical issue in cyber security. This can lead to a temporary or even prolonged loss of service for users. These attacks mainly target e-commerce platforms, online services, and financial institutions. DDoS attacks need to be detected since they cause serious problems. Supervised machine learning models are effective mechanisms for detecting DDoS attacks. In this paper, a PCA-based Enhanced Distributed DDoS Attack Detection (EDAD) framework has been proposed. Various Machine Learning (ML) algorithms and feature selection techniques have been used to detect DDoS attacks. Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), K-Nearest Neighbours (KNN), Decision Tree (DT) supervised models, and Principle Component Analysis (PCA) feature selection method are used to differentiate between attack and regular traffic. The CICIDS2018, CICIDS2017, and CICDDoS-2019 datasets are used to evaluate the performances of ML algorithms. Various performance metrics of these algorithms are studied and compared to find the best algorithm that yields the highest accuracy. It is found that RF yields the highest accuracy of 98.9% on CICIDS2017. In the CICDDoS2019 dataset, RF and KNN yield a higher accuracy of 98.7. On the CICIDS2018 dataset, SVM gives the highest accuracy of 98.7%.
- Research Article
8
- 10.3390/a15100383
- Oct 19, 2022
- Algorithms
The features of a dataset play an important role in the construction of a machine learning model. Because big datasets often have a large number of features, they may contain features that are less relevant to the machine learning task, which makes the process more time-consuming and complex. In order to facilitate learning, it is always recommended to remove the less significant features. The process of eliminating the irrelevant features and finding an optimal feature set involves comprehensively searching the dataset and considering every subset in the data. In this research, we present a distributed fuzzy cognitive map based learning-based wrapper method for feature selection that is able to extract those features from a dataset that play the most significant role in decision making. Fuzzy cognitive maps (FCMs) represent a hybrid computing technique combining elements of both fuzzy logic and cognitive maps. Using Spark’s resilient distributed datasets (RDDs), the proposed model can work effectively in a distributed manner for quick, in-memory processing along with effective iterative computations. According to the experimental results, when the proposed model is applied to a classification task, the features selected by the model help to expedite the classification process. The selection of relevant features using the proposed algorithm is on par with existing feature selection algorithms. In conjunction with a random forest classifier, the proposed model produced an average accuracy above 90%, as opposed to 85.6% accuracy when no feature selection strategy was adopted.
- Research Article
5
- 10.1080/23279095.2024.2382823
- Jul 31, 2024
- Applied Neuropsychology: Adult
The cognitive impairment known as dementia affects millions of individuals throughout the globe. The use of machine learning (ML) and deep learning (DL) algorithms has shown great promise as a means of early identification and treatment of dementia. Dementias such as Alzheimer’s Dementia, frontotemporal dementia, Lewy body dementia, and vascular dementia are all discussed in this article, along with a literature review on using ML algorithms in their diagnosis. Different ML algorithms, such as support vector machines, artificial neural networks, decision trees, and random forests, are compared and contrasted, along with their benefits and drawbacks. As discussed in this article, accurate ML models may be achieved by carefully considering feature selection and data preparation. We also discuss how ML algorithms can predict disease progression and patient responses to therapy. However, overreliance on ML and DL technologies should be avoided without further proof. It’s important to note that these technologies are meant to assist in diagnosis but should not be used as the sole criteria for a final diagnosis. The research implies that ML algorithms may help increase the precision with which dementia is diagnosed, especially in its early stages. The efficacy of ML and DL algorithms in clinical contexts must be verified, and ethical issues around the use of personal data must be addressed, but this requires more study.
- Research Article
78
- 10.1109/tii.2019.2936825
- Mar 1, 2020
- IEEE Transactions on Industrial Informatics
Privacy preserving in machine learning is a crucial issue in industry informatics since data used for training in industries usually contain sensitive information. Existing differentially private machine learning algorithms have not considered the impact of data correlation, which may lead to more privacy leakage than expected in industrial applications. For example, data collected for traffic monitoring may contain some correlated records due to temporal correlation or user correlation. To fill this gap, in this article, we propose a correlation reduction scheme with differentially private feature selection considering the issue of privacy loss when data have correlation in machine learning tasks. The proposed scheme involves five steps with the goal of managing the extent of data correlation, preserving the privacy, and supporting accuracy in the prediction results. In this way, the impact of data correlation is relieved with the proposed feature selection scheme, and moreover the privacy issue of data correlation in learning is guaranteed. The proposed method can be widely used in machine learning algorithms, which provide services in industrial areas. Experiments show that the proposed scheme can produce better prediction results with machine learning tasks and fewer mean square errors for data queries compared to existing schemes.
- Research Article
6
- 10.1186/s12931-024-02911-1
- Jul 24, 2024
- Respiratory Research
BackgroundThe use of machine learning(ML) methods would improve the diagnosis of small airway dysfunction(SAD) in subjects with chronic respiratory symptoms and preserved pulmonary function(PPF). This paper evaluated the performance of several ML algorithms associated with the impulse oscillometry(IOS) analysis to aid in the diagnostic of respiratory changes in SAD. We also find out the best configuration for this task.MethodsIOS and spirometry were measured in 280 subjects, including a healthy control group (n = 78), a group with normal spirometry (n = 158) and a group with abnormal spirometry (n = 44). Various supervised machine learning (ML) algorithms and feature selection strategies were examined, such as Support Vector Machines (SVM), Random Forests (RF), Adaptive Boosting (ADABOOST), Navie Bayesian (BAYES), and K-Nearest Neighbors (KNN).ResultsThe first experiment of this study demonstrated that the best oscillometric parameter (BOP) was R5, with an AUC value of 0.642, when comparing a healthy control group(CG) with patients in the group without lung volume-defined SAD(PPFN). The AUC value of BOP in the control group was 0.769 compared with patients with spirometry defined SAD(PPFA) in the PPF population. In the second experiment, the ML technique was used. In CGvsPPFN, RF and ADABOOST had the best diagnostic results (AUC = 0.914, 0.915), with significantly higher accuracy compared to BOP (p < 0.01). In CGvsPPFA, RF and ADABOOST had the best diagnostic results (AUC = 0.951, 0.971) and significantly higher diagnostic accuracy (p < 0.01). In the third, fourth and fifth experiments, different feature selection techniques allowed us to find the best IOS parameters (R5, (R5-R20)/R5 and Fres). The results demonstrate that the performance of ADABOOST remained essentially unaltered following the application of the feature selector, whereas the diagnostic accuracy of the remaining four classifiers (RF, SVM, BAYES, and KNN) is marginally enhanced.ConclusionsIOS combined with ML algorithms provide a new method for diagnosing SAD in subjects with chronic respiratory symptoms and PPF. The present study’s findings provide evidence that this combination may help in the early diagnosis of respiratory changes in these patients.
- Supplementary Content
31
- 10.3390/ma16083134
- Apr 16, 2023
- Materials
Perovskite materials have been one of the most important research objects in materials science due to their excellent photoelectric properties as well as correspondingly complex structures. Machine learning (ML) methods have been playing an important role in the design and discovery of perovskite materials, while feature selection as a dimensionality reduction method has occupied a crucial position in the ML workflow. In this review, we introduced the recent advances in the applications of feature selection in perovskite materials. First, the development tendency of publications about ML in perovskite materials was analyzed, and the ML workflow for materials was summarized. Then the commonly used feature selection methods were briefly introduced, and the applications of feature selection in inorganic perovskites, hybrid organic-inorganic perovskites (HOIPs), and double perovskites (DPs) were reviewed. Finally, we put forward some directions for the future development of feature selection in machine learning for perovskite material design.
- Research Article
57
- 10.1186/1471-2105-12-375
- Sep 23, 2011
- BMC Bioinformatics
BackgroundThe widely used k top scoring pair (k-TSP) algorithm is a simple yet powerful parameter-free classifier. It owes its success in many cancer microarray datasets to an effective feature selection algorithm that is based on relative expression ordering of gene pairs. However, its general robustness does not extend to some difficult datasets, such as those involving cancer outcome prediction, which may be due to the relatively simple voting scheme used by the classifier. We believe that the performance can be enhanced by separating its effective feature selection component and combining it with a powerful classifier such as the support vector machine (SVM). More generally the top scoring pairs generated by the k-TSP ranking algorithm can be used as a dimensionally reduced subspace for other machine learning classifiers.ResultsWe developed an approach integrating the k-TSP ranking algorithm (TSP) with other machine learning methods, allowing combination of the computationally efficient, multivariate feature ranking of k-TSP with multivariate classifiers such as SVM. We evaluated this hybrid scheme (k-TSP+SVM) in a range of simulated datasets with known data structures. As compared with other feature selection methods, such as a univariate method similar to Fisher's discriminant criterion (Fisher), or a recursive feature elimination embedded in SVM (RFE), TSP is increasingly more effective than the other two methods as the informative genes become progressively more correlated, which is demonstrated both in terms of the classification performance and the ability to recover true informative genes. We also applied this hybrid scheme to four cancer prognosis datasets, in which k-TSP+SVM outperforms k-TSP classifier in all datasets, and achieves either comparable or superior performance to that using SVM alone. In concurrence with what is observed in simulation, TSP appears to be a better feature selector than Fisher and RFE in some of the cancer datasetsConclusionsThe k-TSP ranking algorithm can be used as a computationally efficient, multivariate filter method for feature selection in machine learning. SVM in combination with k-TSP ranking algorithm outperforms k-TSP and SVM alone in simulated datasets and in some cancer prognosis datasets. Simulation studies suggest that as a feature selector, it is better tuned to certain data characteristics, i.e. correlations among informative genes, which is potentially interesting as an alternative feature ranking method in pathway analysis.
- Research Article
7
- 10.54554/jtec.2023.15.03.002
- Sep 30, 2023
- Journal of Telecommunication, Electronic and Computer Engineering (JTEC)
Thyroid disease is one of the most disturbing hormonal disorders faced by the global population. To help the healthcare industry to diagnose the disorder rapidly and accurately, supervised machine learning algorithms and feature selection were introduced to play an essential role in predicting whether a patient has developed thyroid disease from his/her various characteristics. Therefore, in this work, a new feature selection library was introduced, which was the Featurewiz in the Python library. The goals were to present the performance of the Featurewiz library and to decide on a remarkable model for thyroid disease prediction among several machine learning models, such as Decision Tree, K-Nearest Neighbor, Logistic Regression, Naïve Bayes, Support Vector Classifier, and ensembled machine learning algorithms (Random Forest and Extreme Gradient Boost). A data set consisting of records of thyroid patients in Australia was used to develop the machine-learning models. After the data set was cleaned, exploratory data analysis was carried out. The models were then built in two ways: without feature selection and with feature selection. The feature selection process was conducted by using a new Python library called Featurewiz. The performances of the models from the two operations were evaluated using three performance metrics, including accuracy, F1-score, and AUC (Area Under Curve) value from ROC (Receiver Operating Characteristics Curve). From the two operations, the results are similar in the way that tree-based models, especially those formed by the ensemble method, outperform the statistical models. Initially, in the process without feature selection, the champion model is XGBoost with 99.23% accuracy, while Random Forest ranks second with 98.79% accuracy. However, after the feature selection, the result reveals that the champion model is Random Forest. This model achieves an improvement of 0.66% in accuracy (99.45%), making it the best model from both operations. The model also scores 0.99 and 0.97 in F1-score and AUC values, respectively. The valuable insights gained from this study can serve as a comprehensive framework for machine learning applications in predicting thyroid illness. Additionally, the study highlights the advantageous utilization of the Python feature selection library, Featurewiz. With the combination of Featurewiz and machine learning applications, medical authorities can save time and reduce the risk of misdiagnosis when identifying patients with thyroid disease.
- Research Article
3
- 10.1002/bdr2.2245
- Sep 8, 2023
- Birth defects research
International Classification of Diseases (ICD) codes recorded in administrative data are often used to identify congenital heart defects (CHD). However, these codes may inaccurately identify true positive (TP) CHD individuals. CHD surveillance could be strengthened by accurate CHD identification in administrative records using machine learning (ML) algorithms. To identify features relevant to accurate CHD identification, traditional ML models were applied to a validated dataset of 779 patients; encounter level data, including ICD-9-CM and CPT codes, from 2011 to 2013 at four US sites were utilized. Five-fold cross-validation determined overlapping important features that best predicted TP CHD individuals. Median values and 95% confidence intervals (CIs) of area under the receiver operating curve, positive predictive value (PPV), negative predictive value, sensitivity, specificity, and F1-score were compared across four ML models: Logistic Regression, Gaussian Naive Bayes, Random Forest, and eXtreme Gradient Boosting (XGBoost). Baseline PPV was 76.5% from expert clinician validation of ICD-9-CM CHD-related codes. Feature selection for ML decreased 7138 features to 10 that best predicted TP CHD cases. During training and testing, XGBoost performed the best in median accuracy (F1-score) and PPV, 0.84 (95% CI: 0.76, 0.91) and 0.94 (95% CI: 0.91, 0.96), respectively. When applied to the entire dataset, XGBoost revealed a median PPV of 0.94 (95% CI: 0.94, 0.95). Applying ML algorithms improved the accuracy of identifying TP CHD cases in comparison to ICD codes alone. Use of this technique to identify CHD cases would improve generalizability of results obtained from large datasets to the CHD patient population, enhancing public health surveillance efforts.
- Research Article
59
- 10.3390/rs12132110
- Jul 1, 2020
- Remote Sensing
Leaf area index (LAI) is an essential vegetation parameter that represents the light energy utilization and vegetation canopy structure. As the only in-operation hyperspectral satellite launched by China, GF-5 is potentially useful for accurate LAI estimation. However, there is no research focus on evaluating GF-5 data for LAI estimation. Hyperspectral remote sensing data contains abundant information about the reflective characteristics of vegetation canopies, but these abound data also easily result in a dimensionality curse. Therefore, feature selection (FS) is necessary to reduce data redundancy to achieve more reliable estimations. Currently, machine learning (ML) algorithms have been widely used for FS. Moreover, the same ML algorithm is usually conducted for both FS and regression in LAI estimation. However, no evidence suggests that this is the optimal solution. Therefore, this study focuses on evaluating the capacity of GF-5 spectral reflectance for estimating LAI and the performances of different combination of FS and ML algorithms. Firstly, the PROSAIL model, which coupled leaf optical properties model PROSPECT and the scattering by arbitrarily inclined leaves (SAIL) model, was used to generate simulated GF-5 reflectance data under different vegetation and soil conditions, and then three FS methods, including random forest (RF), K-means clustering (K-means) and mean impact value (MIV), and three ML algorithms, including random forest regression (RFR), back propagation neural network (BPNN) and K-nearest neighbor (KNN) were used to develop nine LAI estimation models. The FS process was conducted twice using different strategies: Firstly, three FS methods were conducted to search the lowest dimension number, which maintained the estimation accuracy of all bands. Then, the sequential backward selection (SBS) method was used to eliminate the bands having minimal impact on LAI estimation accuracy. Finally, three best estimation models were selected and evaluated using reference LAI. The results showed that although the RF_RFR model (RF used for feature selection and RFR used for regression) achieved reliable LAI estimates (coefficient of determination (R2) = 0.828, root mean square error (RMSE) = 0.839), the poor performance (R2 = 0.763, RMSE = 0.987) of the MIV_BPNN model (MIV used for feature selection and BPNN used for regression) suggested using feature selection and regression conducted by the same ML algorithm could not always ensure an optimal estimation. Moreover, RF selection preserved the most informative bands for LAI estimation so that each ML regression method could achieve satisfactory estimation results. Finally, the results indicated that the RF_KNN model (RF used as feature selection and KNN used for regression) with seven GF-5 spectral band reflectance achieved the better estimation results than others when validated by simulated data (R2 = 0.834, RMSE = 0.824) and actual reference LAI (R2 = 0.659, RMSE = 0.697).
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.