Estimating the total nitrogen content of Aquilaria sinensis leaves based on a hybrid feature selection algorithm and image data from a modified digital camera

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Estimating the total nitrogen content of Aquilaria sinensis leaves based on a hybrid feature selection algorithm and image data from a modified digital camera

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 74
  • 10.3390/rs12132110
Leaf Area Index Estimation Algorithm for GF-5 Hyperspectral Data Based on Different Feature Selection and Machine Learning Methods
  • Jul 1, 2020
  • Remote Sensing
  • Zhulin Chen + 10 more

Leaf area index (LAI) is an essential vegetation parameter that represents the light energy utilization and vegetation canopy structure. As the only in-operation hyperspectral satellite launched by China, GF-5 is potentially useful for accurate LAI estimation. However, there is no research focus on evaluating GF-5 data for LAI estimation. Hyperspectral remote sensing data contains abundant information about the reflective characteristics of vegetation canopies, but these abound data also easily result in a dimensionality curse. Therefore, feature selection (FS) is necessary to reduce data redundancy to achieve more reliable estimations. Currently, machine learning (ML) algorithms have been widely used for FS. Moreover, the same ML algorithm is usually conducted for both FS and regression in LAI estimation. However, no evidence suggests that this is the optimal solution. Therefore, this study focuses on evaluating the capacity of GF-5 spectral reflectance for estimating LAI and the performances of different combination of FS and ML algorithms. Firstly, the PROSAIL model, which coupled leaf optical properties model PROSPECT and the scattering by arbitrarily inclined leaves (SAIL) model, was used to generate simulated GF-5 reflectance data under different vegetation and soil conditions, and then three FS methods, including random forest (RF), K-means clustering (K-means) and mean impact value (MIV), and three ML algorithms, including random forest regression (RFR), back propagation neural network (BPNN) and K-nearest neighbor (KNN) were used to develop nine LAI estimation models. The FS process was conducted twice using different strategies: Firstly, three FS methods were conducted to search the lowest dimension number, which maintained the estimation accuracy of all bands. Then, the sequential backward selection (SBS) method was used to eliminate the bands having minimal impact on LAI estimation accuracy. Finally, three best estimation models were selected and evaluated using reference LAI. The results showed that although the RF_RFR model (RF used for feature selection and RFR used for regression) achieved reliable LAI estimates (coefficient of determination (R2) = 0.828, root mean square error (RMSE) = 0.839), the poor performance (R2 = 0.763, RMSE = 0.987) of the MIV_BPNN model (MIV used for feature selection and BPNN used for regression) suggested using feature selection and regression conducted by the same ML algorithm could not always ensure an optimal estimation. Moreover, RF selection preserved the most informative bands for LAI estimation so that each ML regression method could achieve satisfactory estimation results. Finally, the results indicated that the RF_KNN model (RF used as feature selection and KNN used for regression) with seven GF-5 spectral band reflectance achieved the better estimation results than others when validated by simulated data (R2 = 0.834, RMSE = 0.824) and actual reference LAI (R2 = 0.659, RMSE = 0.697).

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 63
  • 10.1007/s00521-024-10226-x
A proposed framework for crop yield prediction using hybrid feature selection approach and optimized machine learning
  • Aug 16, 2024
  • Neural Computing and Applications
  • Mahmoud Abdel-Salam + 2 more

Accurately predicting crop yield is essential for optimizing agricultural practices and ensuring food security. However, existing approaches often struggle to capture the complex interactions between various environmental factors and crop growth, leading to suboptimal predictions. Consequently, identifying the most important feature is vital when leveraging Support Vector Regressor (SVR) for crop yield prediction. In addition, the manual tuning of SVR hyperparameters may not always offer high accuracy. In this paper, we introduce a novel framework for predicting crop yields that address these challenges. Our framework integrates a new hybrid feature selection approach with an optimized SVR model to enhance prediction accuracy efficiently. The proposed framework comprises three phases: preprocessing, hybrid feature selection, and prediction phases. In preprocessing phase, data normalization is conducted, followed by an application of K-means clustering in conjunction with the correlation-based filter (CFS) to generate a reduced dataset. Subsequently, in the hybrid feature selection phase, a novel hybrid FMIG-RFE feature selection approach is proposed. Finally, the prediction phase introduces an improved variant of Crayfish Optimization Algorithm (COA), named ICOA, which is utilized to optimize the hyperparameters of SVR model thereby achieving superior prediction accuracy along with the novel hybrid feature selection approach. Several experiments are conducted to assess and evaluate the performance of the proposed framework. The results demonstrated the superior performance of the proposed framework over state-of-art approaches. Furthermore, experimental findings regarding the ICOA optimization algorithm affirm its efficacy in optimizing the hyperparameters of SVR model, thereby enhancing both prediction accuracy and computational efficiency, surpassing existing algorithms.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 21
  • 10.3390/su14148455
Hyperspectral Modeling of Soil Organic Matter Based on Characteristic Wavelength in East China
  • Jul 11, 2022
  • Sustainability
  • Mingsong Zhao + 3 more

Soil organic matter (SOM) is a key index of soil fertility. Visible and near-infrared (VNIR, 350–2500 nm) reflectance spectroscopy is an effective method for modeling SOM content. Characteristic wavelength screening and spectral transformation may improve the performance of SOM prediction. This study aimed to explore the optimal combination of characteristic wavelength selection and spectral transformation for hyperspectral modeling of SOM. A total of 219 topsoil (0–20 cm) samples were collected from two soil types in the East China. VNIR reflectance spectra were measured in the laboratory. Firstly, after spectral transformation (inverse-log reflectance (LR), continuum removal (CR) and first-order derivative reflectance (FDR)) of VNIR spectra, characteristic wavelengths were selected by competitive adaptive reweighted sampling (CARS) and uninformative variables elimination (UVE) algorithms. Secondly, the SOM prediction models were constructed based on the partial least squares regression (PLSR), random forest (RF) and support vector regression (SVR) methods using the full spectra and selected wavelengths, respectively. Finally, optimal SOM prediction models were selected for two soil types. The results were as follows: (1) The CARS algorithm screened 40–125 characteristic wavelengths from the full spectra. The UVE algorithm screened 105–884 characteristic wavelengths. (2) For two soil types and full spectra, CARS and UVE improved the SOM modeling precision based on the PLSR and SVR methods. The coefficient of determination (R2) value in the validation of the CARS-PLSR (PLSR model combined with CARS) and CARS-SVR (SVR model combined CARS) models ranged from 0.69 to 0.95, and the relative percent deviation (RPD) value ranged from 1.74 to 4.31. Lin’s concordance correlation coefficient (LCCC) values ranged from 0.83 to 0.97. The UVE-PLSR and UVE-SVR models showed moderate precision. (3) The PLSR and SVR modeling accuracies of Paddy soil were better than those for Shajiang black soil. RF models performed worse for both soil types, with the R2 values of validation ranging from 0.22 to 0.68 and RPD values ranging from 1.01 to 1.60. (4) For Paddy soil, the optimal SOM prediction models (highest R2 and RPD, lowest root mean square error (RMSE)) were CR-CARS-PLSR (R2 and RMSE: 0.97 and 1.21 g/kg in calibration sets, 0.95 and 1.72 g/kg in validation sets, RPD: 4.31) and CR-CARS-SVR (R2 and RMSE: 0.98 and 1.04 g/kg in calibration sets, 0.91 and 2.24 g/kg in validation sets, RPD: 3.37). For Shajiang black soil, the optimal SOM prediction models were LR-CARS-PLSR (R2 and RMSE: 0.95 and 0.93 g/kg in calibration sets, 0.86 and 1.44 g/kg in validation sets, RPD: 2.62) and FDR-CARS-SVR (R2 and RMSE: 0.99 and 0.45 g/kg in calibration sets, 0.83 and 1.58 g/kg in validation sets, RPD: 2.38). The results suggested that the CARS algorithm combined CR and FDR can significantly improve the modeling accuracy of SOM content.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.1109/access.2020.3027828
Extension of pQSAR: Ensemble Model Generated by Random Forest and Partial Least Squares Regressions
  • Jan 1, 2020
  • IEEE Access
  • Byung Chun Kim + 4 more

Quantitative structure-activity relationship (QSAR) regression models are mathematical ones which relate the structural properties of chemicals to the potencies of the biological activities of the chemicals. In QSAR models, the physical and chemical information of the molecules is encoded into quantitative numbers called descriptors. Recently, experimental test results (profiles) have been used as descriptors of chemicals. Profile QSAR 2.0 (pQSAR) model suggested by Martin et al., is a multitask, two step machine learning prediction method with a combination of random forest regressions (RFRs) and partial least squares regression (PLSR). In pQSAR model, one fills the profile table's missing values with RFRs and then builds PLSR using the profile predictions. Note that in the second step of the pQSAR method, PLSR's predictor variables are profiles; so activity values, and the response variables are also activity values. Thus we can use the PLSRs to update the profile table and then repeat the second step. In this work, we propose an extended model of pQSAR generated by RFRs and PLSRs. Experiment of updating the given full initially predicted profile table by two kinds of prediction models, RFRs and PLSRs, has been conducted iteratively for the PKIS and ChEMBL data sets. Even though prediction performance of individual combination of RFRs and PLSRs varies, the average of the all possible predicted profile tables for given iteration shows better performance. This ensemble model has better prediction performance in sense of Pearson's R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> compared to that of the pQSAR model.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 41
  • 10.3390/rs12081234
Prediction of Early Season Nitrogen Uptake in Maize Using High-Resolution Aerial Hyperspectral Imagery
  • Apr 12, 2020
  • Remote Sensing
  • Tyler Nigon + 5 more

The ability to predict spatially explicit nitrogen uptake (NUP) in maize (Zea mays L.) during the early development stages provides clear value for making in-season nitrogen fertilizer applications that can improve NUP efficiency and reduce the risk of nitrogen loss to the environment. Aerial hyperspectral imaging is an attractive agronomic research tool for its ability to capture spectral data over relatively large areas, enabling its use for predicting NUP at the field scale. The overarching goal of this work was to use supervised learning regression algorithms—Lasso, support vector regression (SVR), random forest, and partial least squares regression (PLSR)—to predict early season (i.e., V6–V14) maize NUP at three experimental sites in Minnesota using high-resolution hyperspectral imagery. In addition to the spectral features offered by hyperspectral imaging, the 10th percentile Modified Chlorophyll Absorption Ratio Index Improved (MCARI2) was made available to the learning models as an auxiliary feature to assess its ability to improve NUP prediction accuracy. The trained models demonstrated robustness by maintaining satisfactory prediction accuracy across locations, pixel sizes, development stages, and a broad range of NUP values (4.8 to 182 kg ha−1). Using the four most informative spectral features in addition to the auxiliary feature, the mean absolute error (MAE) of Lasso, SVR, and PLSR models (9.4, 9.7, and 9.5 kg ha−1, respectively) was lower than that of random forest (11.2 kg ha−1). The relative MAE for the Lasso, SVR, PLSR, and random forest models was 16.5%, 17.0%, 16.6%, and 19.6%, respectively. The inclusion of the auxiliary feature not only improved overall prediction accuracy by 1.6 kg ha−1 (14%) across all models, but it also reduced the number of input features required to reach optimal performance. The variance of predicted NUP increased as the measured NUP increased (MAE of the Lasso model increased from 4.0 to 12.1 kg ha−1 for measured NUP less than 25 kg ha−1 and greater than 100 kg ha−1, respectively). The most influential spectral features were oftentimes adjacent to each other (i.e., within approximately 6 nm), indicating the importance of both spectral precision and derivative spectra around key wavelengths for explaining NUP. Finally, several challenges and opportunities are discussed regarding the use of these results in the context of improving nitrogen fertilizer management.

  • Research Article
  • Cite Count Icon 1
  • 10.29244/ijsa.v4i1.610
KAJIAN SIMULASI PERBANDINGAN METODE REGRESI KUADRAT TERKECIL PARSIAL, SUPPORT VECTOR MACHINE, DAN RANDOM FOREST
  • Feb 28, 2020
  • Indonesian Journal of Statistics and Its Applications
  • Asep Andri Fauzi + 2 more

Highly correlated predictors and nonlinear relationships between response and predictors potentially affected the performance of predictive modeling, especially when using the ordinary least square (OLS) method. The simple technique to solve this problem is by using another method such as Partial Least Square Regression (PLSR), Support Vector Regression with kernel Radial Basis Function (SVR-RBF), and Random Forest Regression (RFR). The purpose of this study is to compare OLS, PLSR, SVR-RBF, and RFR using simulation data. The methods were evaluated by the root mean square error prediction (RMSEP). The result showed that in the linear model, SVR-RBF and RFR have large RMSEP; OLS and PLSR are better than SVR-RBF and RFR, and PLSR provides much more stable prediction than OLS in case of highly correlated predictors and small sample size. In nonlinear data, RFR produced the smallest RMSEP when data contains high correlated predictors.

  • Research Article
  • Cite Count Icon 1
  • 10.11591/ijai.v14.i2.pp1192-1200
A hybrid feature selection with data-driven approach for cardiovascular disease prediction using machine learning
  • Apr 1, 2025
  • IAES International Journal of Artificial Intelligence (IJ-AI)
  • Thoutireddy Shilpa + 1 more

Affecting various disorders of heart and blood vessels mainly cardiovascular diseases (CVDs) is the leading cause of human mortality on the planet. A number of machine learning (ML) based supervised learning approaches existing in the literature have been found useful in the clinical decision support system (CDSS) for detecting CVDs automatically. The challenge, however, is that their performance tends to decline unless the training data is of a certain standard. Several approaches to solving this problem are known as feature selection techniques. Despite several notable advancements in the CVD modeling literature, a weak compendium of research exists in an area which supports the integration of the feature selection approach as a means of enhancing the training quality and thus the prediction accuracy. Against this background, in this paper, we proposed a framework called the cardiovascular disease prediction framework (CVDPF) that integrates ML methods. To support this, we designed and proposed a new hybrid feature selection (HFS) algorithm that aims to reduce the number of parameters. This algorithm adopts several filter methods in order to enhance its performance for the task of feature selection. To improve the prediction accuracy of CVDs, a number of ML tools using the HFS approach has been designed and is termed as machine learning based cardiovascular disease prediction (ML-CVDP). The validation of the framework and the algorithms discussed has been done on the basis of a CVD dataset. The experimental findings demonstrated that CVDPF in combination with HFS outperforms other methods of feature selection available.

  • Research Article
  • Cite Count Icon 55
  • 10.1016/j.eswa.2017.06.032
A new hybrid feature selection approach using feature association map for supervised and unsupervised classification
  • Jul 1, 2017
  • Expert Systems with Applications
  • Amit Kumar Das + 3 more

A new hybrid feature selection approach using feature association map for supervised and unsupervised classification

  • Research Article
  • Cite Count Icon 27
  • 10.1016/j.bspc.2019.101583
Detection of congestive heart failure from short-term heart rate variability segments using hybrid feature selection approach
  • Jun 18, 2019
  • Biomedical Signal Processing and Control
  • Alan Jovic + 2 more

Detection of congestive heart failure from short-term heart rate variability segments using hybrid feature selection approach

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 71
  • 10.5194/bg-14-5551-2017
Empirical methods for the estimation of Southern Ocean CO 2 : support vector and random forest regression
  • Dec 8, 2017
  • Biogeosciences
  • Luke Gregor + 2 more

Abstract. The Southern Ocean accounts for 40 % of oceanic CO2 uptake, but the estimates are bound by large uncertainties due to a paucity in observations. Gap-filling empirical methods have been used to good effect to approximate pCO2 from satellite observable variables in other parts of the ocean, but many of these methods are not in agreement in the Southern Ocean. In this study we propose two additional methods that perform well in the Southern Ocean: support vector regression (SVR) and random forest regression (RFR). The methods are used to estimate ΔpCO2 in the Southern Ocean based on SOCAT v3, achieving similar trends to the SOM-FFN method by Landschützer et al. (2014). Results show that the SOM-FFN and RFR approaches have RMSEs of similar magnitude (14.84 and 16.45 µatm, where 1 atm = 101 325 Pa) where the SVR method has a larger RMSE (24.40 µatm). However, the larger errors for SVR and RFR are, in part, due to an increase in coastal observations from SOCAT v2 to v3, where the SOM-FFN method used v2 data. The success of both SOM-FFN and RFR depends on the ability to adapt to different modes of variability. The SOM-FFN achieves this by having independent regression models for each cluster, while this flexibility is intrinsic to the RFR method. Analyses of the estimates shows that the SVR and RFR's respective sensitivity and robustness to outliers define the outcome significantly. Further analyses on the methods were performed by using a synthetic dataset to assess the following: which method (RFR or SVR) has the best performance? What is the effect of using time, latitude and longitude as proxy variables on ΔpCO2? What is the impact of the sampling bias in the SOCAT v3 dataset on the estimates? We find that while RFR is indeed better than SVR, the ensemble of the two methods outperforms either one, due to complementary strengths and weaknesses of the methods. Results also show that for the RFR and SVR implementations, it is better to include coordinates as proxy variables as RMSE scores are lowered and the phasing of the seasonal cycle is more accurate. Lastly, we show that there is only a weak bias due to undersampling. The synthetic data provide a useful framework to test methods in regions of sparse data coverage and show potential as a useful tool to evaluate methods in future studies.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 84
  • 10.3390/rs12132082
Quantifying Leaf Chlorophyll Concentration of Sorghum from Hyperspectral Data Using Derivative Calculus and Machine Learning
  • Jun 29, 2020
  • Remote Sensing
  • Sourav Bhadra + 6 more

Leaf chlorophyll concentration (LCC) is an important indicator of plant health, vigor, physiological status, productivity, and nutrient deficiencies. Hyperspectral spectroscopy at leaf level has been widely used to estimate LCC accurately and non-destructively. This study utilized leaf-level hyperspectral data with derivative calculus and machine learning to estimate LCC of sorghum. We calculated fractional derivative (FD) orders starting from 0.2 to 2.0 with 0.2 order increments. Additionally, 43 common vegetation indices (VIs) were calculated from leaf spectral reflectance factor to make comparisons with reflectance-based data. Within the modeling pipeline, three feature selection methods were assessed: Pearson’s correlation coefficient (PCC), partial least squares based variable importance in the projection (VIP), and random forest-based mean decrease impurity (MDI). Finally, we used partial least squares regression (PLSR), random forest regression (RFR), support vector regression (SVR), and extreme learning regression (ELR) to estimate the LCC of sorghum. Results showed that: (1) increasing derivative order can show improved model performance until certain order for reflectance-based analysis; however, it is inconclusive to state that a particular order is optimal for estimating LCC of sorghum; (2) VI-based modeling outperformed derivative augmented reflectance factor-based modeling; (3) mean decrease impurity was found effective in selecting sensitive features from large feature space (reflectance-based analysis), whereas simple Pearson’s correlation coefficient worked better with smaller feature space (VI-based analysis); and (4) SVR outperformed all other models within reflectance-based analysis; alternatively, ELR with VIs from original reflectance yielded slightly better results compared to all other models.

  • Research Article
  • Cite Count Icon 70
  • 10.1080/01431161.2018.1541110
Mapping pasture biomass in Mongolia using Partial Least Squares, Random Forest regression and Landsat 8 imagery
  • Nov 13, 2018
  • International Journal of Remote Sensing
  • Munkhdulam Otgonbayar + 3 more

ABSTRACTThe aim of this study was to develop a robust methodology to estimate pasture biomass across the huge land surface of Mongolia (1.56 × 106 km2) using high-resolution Landsat 8 satellite data calibrated against field-measured biomass samples. Two widely used regression models were compared and adopted for this study: Partial Least Squares (PLS) and Random Forest (RF). Both methods were trained to predict pasture biomass using a total of 17 spectral indices derived from Landsat 8 multi-temporal satellite imagery as predictor variables. For training, reference biomass data from a field survey of 553 sites were available. PLS results showed a satisfactory correlation between field measured and estimated biomass with coefficient of determination (R2) = 0.750 and Root Mean Square Error (RMSE) = 101.10 kg ha−1. The RF regression gave similar results with R2 = 0.764, RMSE = 98.00 kg ha−1. An examination of feature importance found the following vegetation indices to be the most relevant: Green Chlorophyll Index (CLgreen), Simple Ratio (SR), Wide Dynamic Range Vegetation Index (WDRVI), Enhanced Vegetation Index EVI1 and Normalized Difference Vegetation Index (NDVI) indices. With respect to the spectral reflectances, Red and Short Wavelength Infra-Red2 (SWIR2) bands showed the strongest correlation with biomass. Using the developed PLS models, a spatial map of pasture biomass covering Mongolia at a spatial resolution of 30 m was generated. Our study confirms the high potential of RF and PLS regression (PLSR) models to predict pasture biomass. The computationally simpler PLSR model is preferred for applications involving large regions. This method can be implemented easily, provided that sufficient reference data and cloud-free observations are available.

  • Research Article
  • Cite Count Icon 40
  • 10.1016/j.catena.2020.105041
Monitoring properties of the salt-affected soils by multivariate analysis of the visible and near-infrared hyperspectral data
  • Nov 19, 2020
  • CATENA
  • Gopal Ramdas Mahajan + 7 more

Monitoring properties of the salt-affected soils by multivariate analysis of the visible and near-infrared hyperspectral data

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.iswcr.2024.10.002
Visible, near-infrared, and shortwave-infrared spectra as an input variable for digital mapping of soil organic carbon
  • Oct 9, 2024
  • International Soil and Water Conservation Research
  • Vahid Khosravi + 5 more

Visible, near-infrared, and shortwave-infrared spectra as an input variable for digital mapping of soil organic carbon

  • Dataset
  • Cite Count Icon 3
  • 10.22541/au.158212912.27052084
Using hyperspectral remote sensing to monitor the properties of salt-affected soils
  • Feb 19, 2020
  • Authorea
  • Gopal Mahajan + 7 more

The aim of the study was to estimate the properties of the salt-affected soils (SAS) using hyperspectral remote sensing. The study was carried out on typical SAS from 372 locations covering 17 coastal districts from west coast region of India. The spectral reflectance of processed soil samples was recorded in the wavelength range of 350-2500 nm. The full data set (n=372) was split into two as calibration dataset (n=260, 70%) to develop the model and validation dataset (n=112, 30%) to evaluate the performance of the model independently. The spectral data were calibrated using the laboratory estimated soil properties with five different multivariate techniques: (a) linear – partial component regression (PCR) and partial least square regression (PLSR) and (b) non-linear– multivariate adaptive regression spline (MARS), random forest (RF) and support vector regression (SVR). In general, the spectral reflectance from the soils decreased with increasing levels of salinity (electrical conductivity, EC). The wavelengths, 494, 673, 800, 1415, 1748, 1915, 2207 and 2385 nm showed peculiar absorption characteristics. The study showed significant achievement in predicting soil properties like soil pH, salinity (EC), bulk density (BD), soil available nitrogen (N), exchangeable magnesium (Mg), soil available zinc (Zn) and boron (B) with acceptable to excellent predictions (ratio of performance to deviation (RPD) ranged 1.48-2.06). Amongst predicted models, SVR, PLSR and PCR were found to be more robust than MARS and RF. The results of the study indicated that the visible near-infrared spectroscopy has the potential predict properties of the SAS.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant