Fusion of RF algorithm and logistic regression model for high-speed illegal toll evasion vehicle inspection

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Abstract With the rapid expansion of the highway network, the phenomenon of toll evasion has become increasingly serious, leading to huge economic losses. This study constructs an efficient model for recognizing evasion vehicles by integrating random forest algorithms, logistic regression, and neural networks to improve inspection efficiency and reduce losses. This study first uses the Min-Max standardization method to process the data to eliminate the influence of different dimensions and numerical ranges. Secondly, key features are selected using the random forest algorithm, and the probability of evasion is predicted using a logistic regression model. Finally, the types of evasion are identified using neural networks. The integrated model performed better than other models in prediction accuracy, classification accuracy, inspection time, mean square error, root mean square error, and stability, as indicated by the results. By achieving predictions accuracy of 92%, recall rate of 94.71%, and AUC of receiver operation characteristic of 96%, it demonstrated excellent prediction performance and application value. Highway management departments receive reliable technical support from this study, which effectively improves the accuracy and efficiency of toll evasion inspections, helping to reduce economic losses and maintain traffic order.

Similar Papers
  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.apr.2024.102270
Ground visibility prediction using tree-based and random-forest machine learning algorithm: Comparative study based on atmospheric pollution and atmospheric boundary layer data
  • Jul 29, 2024
  • Atmospheric Pollution Research
  • Fuzeng Wang + 5 more

Ground visibility prediction using tree-based and random-forest machine learning algorithm: Comparative study based on atmospheric pollution and atmospheric boundary layer data

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/iccbe56101.2022.9888195
Soil Nitrogen Detection Based on Random Forest Algorithm and Near Infrared Spectroscopy
  • May 27, 2022
  • Beitian Zheng + 4 more

In order to achieve rapid detection of soil nitrogen content, a method of soil nitrogen content detection by near-infrared spectroscopy combined with a random forest regression algorithm (RF) was proposed. The spectral data and nitrogen contents of 143 soil samples were collected to establish the detection model by combining the random forest algorithm. The results show that the preferential selection of RF modeling data by ∆Gini can extract the spectral information related to soil nitrogen content and reduce the redundant information of the data. The prediction accuracy of the established RF model is high with a correlation coefficient of 0.909 and root mean square error of 0.1412 for the test set prediction. This study proves the feasibility of NIR spectroscopy combined with a random forest algorithm for soil nitrogen content prediction. The result also demonstrated the feasibility of combining NIR spectroscopy with a random forest algorithm to predict soil nitrogen content and the theoretical basis for the subsequent development of soil composition testing instruments.

  • Research Article
  • Cite Count Icon 27
  • 10.1016/j.catena.2022.106404
Machine learning for cation exchange capacity prediction in different land uses
  • May 28, 2022
  • CATENA
  • Gaurav Mishra + 11 more

Machine learning for cation exchange capacity prediction in different land uses

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 10
  • 10.3390/agriculture13020388
New Spectral Index and Machine Learning Models for Detecting Coffee Leaf Miner Infestation Using Sentinel-2 Multispectral Imagery
  • Feb 6, 2023
  • Agriculture
  • Emerson Ferreira Vilela + 8 more

The coffee leaf miner (Leucoptera coffeella) is a key coffee pest in Brazil that can cause severe defoliation and a negative impact on the productivity. Thus, it is essential to identify initial pest infestation for the sake of appropriate time control to avoid further economic damage to the coffee crops. A fast non-destructive method is an important tool that can be used to monitor the occurrence of the coffee leaf miner. The present work aims to identify the occurrence of coffee leaf miner infestation through a new vegetation index, using multispectral images from the Sentinel-2 satellite and the Google Earth Engine platform. Coffee leaf miner infestation was measured in the field in four cities in the state of Minas Gerais. The largest infestations occurred in September, October, and November but particularly in October 2021, in which the rate of infestation reached 85%, followed by September 2020 with a maximum infestation of 76%. The calculation steps of the vegetation indices and mappings were carried out in the Google Earth Engine cloud processing platform through the development of a script in JavaScript programming language. Combinations of two sensitive bands were selected to detect coffee leaf miner infestation, and from these, the “Coffee-Leaf-Miner Index” was developed, which was compared with other existing vegetation indices in terms of their performance for coffee leaf miner detection. The combination of the NIR–BLUE and NIR–RED bands was more sensitive for the detection of coffee leaf miner infestation; therefore, the NIR, BLUE, and RED bands were selected to develop the new index. The “Coffee-Leaf-Miner Index” presented the best performance among those evaluated, with a coefficient of determination of about 0.87, a root mean square error of 4.92% coffee leaf miner infestation, accuracy of 89.47%, and kappa coefficient of 95.39. The R2 range of other spectral indices which exist in the literature and which were used in this study was from 0.017 to 0.867, and the root mean square error ranged from 4.996 to 13.582% coffee leaf miner infestation. The machine learning method was then adopted using the supervised Random Forest and Support Vector Machine algorithms to recognize patterns of coffee leaf miner infestation in the field, only the Coffee-Leaf-Miner Index was used for the identification test of the coffee leaf miner infestation. The Support Vector Machine with linear Kernel type was applied to establish a discrimination model. The number of trees for the Random Forest classifier was 100. The Support Vector Machine presented a lower performance than the Random Forest algorithm, but the performance of both were above 80% for user and producer precision. Three bands (Blue, Red, NIR) were selected for the creation of the new index, which showed capacity for remote detection of coffee leaf miner infestation on a regional scale. Thus, “Coffee-Leaf-Miner Index” can identify coffee leaf miner infestation thanks to all the complexity involved in detecting pests via orbital remote sensing.

  • Research Article
  • Cite Count Icon 3
  • 10.1007/s10706-017-0420-8
Random Forest Tree Based Approach for Blast Design in Surface Mine
  • Dec 1, 2017
  • Geotechnical and Geological Engineering
  • Arvind K Mishra + 3 more

Blasting is one of the primary mining operations for extracting minerals and ores however, if not designed properly, may have a varying degree of environmental and socio-economic impact in and around mining areas. In Indian mining industry, blast designs are fundamentally based on the experience and capability of the blasting crew and its assessment is more qualitative in nature, based on conventional trial and error basis. With the change in site geology and geotechnical parameters, the blast design parameters also require alterations, which can be standardized with the development of an intelligent system such as neural network. In this paper, the concept of artificial neural network and random forest algorithm has been used for better blast designs. Over 120 blast results from an opencast coal mine have been used for prediction of burden and energy factor with blast hole diameter, bench height to stemming ratio, nature of strata and average fragment size as input parameters. Out of 120 data sets 85 data sets recorded at a surface coal mine was used to train the model and 20 for the validation. Co-efficient of determination and root mean square error was chosen as the indicators to identify the optimum neural network and random forest model. The root mean square values obtained for energy factor is 0.153 while it is 0.1947 for burden. Similarly, the RMSE values obtained using random forest tree algorithm is 0.48 for burden while 50.76 for energy factor. The results revealed that random forest tree network system has potential to design better blast that is not generic and can be a potential tool for blasting engineers to design optimum blast for the mines.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 17
  • 10.3390/s24041112
Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network.
  • Feb 8, 2024
  • Sensors
  • Hongsen Ou + 2 more

(1) Background: In order to solve the problem of missing time-series data due to the influence of the acquisition system or external factors, a missing time-series data interpolation method based on random forest and a generative adversarial interpolation network is proposed. (2) Methods: First, the position of the missing part of the data is calibrated, and the trained random forest algorithm is used for the first data interpolation. The output value of the random forest algorithm is used as the input value of the generative adversarial interpolation network, and the generative adversarial interpolation network is used to calibrate the position. The data are interpolated for the second time, and the advantages of the two algorithms are combined to make the interpolation result closer to the true value. (3) Results: The filling effect of the algorithm is tested on a certain bearing data set, and the root mean square error (RMSE) is used to evaluate the interpolation results. The results show that the RMSE of the interpolation results based on the random forest and generative adversarial interpolation network algorithms in the case of single-segment and multi-segment missing data is only 0.0157, 0.0386, and 0.0527, which is better than the random forest algorithm, generative adversarial interpolation network algorithm, and K-nearest neighbor algorithm. (4) Conclusions: The proposed algorithm performs well in each data set and provides a reference method in the field of data filling.

  • Research Article
  • Cite Count Icon 31
  • 10.3233/jifs-201921
An efficient data prediction model using hybrid Harris Hawk Optimization with random forest algorithm in wireless sensor network
  • Jan 1, 2020
  • Journal of Intelligent & Fuzzy Systems
  • S Ramalingam + 1 more

Wireless Sensor Networks (WSNs) are consistently gathering environmental weather data from sensor nodes on a random basis. The wireless sensor node sends the data via the base station to the cloud server, which frequently consumes immoderate power consumption during transmission. In distribution mode, WSN typically produces imprecise measurable or missing data and redundant data that influence the whole network of WSN. To overcome this complexity, an effective data prediction model was developed for decentralized photovoltaic plants using hybrid Harris Hawk Optimization with Random Forest algorithm (HHO-RF) primarily based on the ensemble learning approach. This work is proposed to predict the precise data and minimization of error in WSN Node. An efficient model for data reduction is proposed based on the Principal Component Analysis (PCA) for processing data from the sensor network. The datasets were gathered from the Tamil Nadu photovoltaic power plant, India. A low cost portable wireless sensor node was developed for collecting PV plant weather data using Internet of Things (IoT). The experimental outcomes of the proposed hybrid HHO-RF approach were compared with the other four algorithms, namely: Linear Regression (LR), Support Vector Machine (SVM), Random Forest (RF) and Long Short Term Memory (LSTM) algorithm. Results show that the determination coefficient (R2), Mean Square Error (MSE), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) values of the HHO-RF model are 0.9987, 0.0693, 0.2336 and 0.15881, respectively. For the prediction of air temperature, the RMSE of the proposed model is 3.82 %, 3.84% and 6.92% model in the lowest, average and highest weather days. The experimental outcomes of the proposed hybrid HHO-RF model have better performance compared to the existing algorithms.

  • Research Article
  • Cite Count Icon 6
  • 10.3390/rs15225351
Soil Texture Mapping in Songnen Plain of China Using Sentinel-2 Imagery
  • Nov 14, 2023
  • Remote Sensing
  • Miao Zheng + 5 more

Soil texture is a key physical property that affects the soil’s ability to retain moisture and nutrients. As a result, it is of extreme importance to conduct remote sensing monitoring of soil texture. Songnen Plain is located in the black soil belt of Northeast China. The development of satellite imagery in remote sensing technology enables the rapid monitoring of large areas. This study aimed to map the surface soil texture of cultivated land in Songnen Plain using Sentinel-2 images and Random Forest (RF) algorithm. We conducted this study by collecting 354 topsoil (0–20 cm) samples in Songnen Plain and evaluating the effectiveness of the bands and spectral indices of Sentinel-2 images and RF algorithm in predicting soil texture (sand, silt, and clay fractions). The results demonstrated that the 16 covariates were moderately and highly correlated with soil texture. And, Band11 of Sentinel-2 images could be used as the corresponding band of soil texture. For sand fraction, the Sentinel-2 images and RF algorithm’s Coefficient of Determination (R2) and Root Mean Square Error (RMSE) were 0.77 and 10.48%, respectively, and for silt fraction, they were 0.75 and 9.38%. Sand fraction decreased from southwest to northeast in Songnen Plain, while silt and clay fractions increased. We found that the Songnen Plain was affected by water erosion and wind erosion, in the northeast and southwest, respectively, providing reference for the implementation of Conservation Tillage policies. The outcome of the study can provide reference for future soil texture mapping with a high resolution.

  • Research Article
  • Cite Count Icon 15
  • 10.32604/cmc.2022.019882
An Ensemble Methods for Medical Insurance Costs Prediction Task
  • Jan 1, 2022
  • Computers, Materials & Continua
  • Nataliya Shakhovska + 3 more

The paper reports three new ensembles of supervised learning predictors for managing medical insurance costs. The open dataset is used for data analysis methods development. The usage of artificial intelligence in the management of financial risks will facilitate economic wear time and money and protect patients’ health. Machine learning is associated with many expectations, but its quality is determined by choosing a good algorithm and the proper steps to plan, develop, and implement the model. The paper aims to develop three new ensembles for individual insurance costs prediction to provide high prediction accuracy. Pierson coefficient and Boruta algorithm are used for feature selection. The boosting, stacking, and bagging ensembles are built. A comparison with existing machine learning algorithms is given. Boosting modes based on regression tree and stochastic gradient descent is built. Bagged CART and Random Forest algorithms are proposed. The boosting and stacking ensembles shown better accuracy than bagging. The tuning parameters for boosting do not allow to decrease the RMSE too. So, bagging shows its weakness in generalizing the prediction. The stacking is developed using K Nearest Neighbors (KNN), Support Vector Machine (SVM), Regression Tree, Linear Regression, Stochastic Gradient Boosting. The random forest (RF) algorithm is used to combine the predictions. One hundred trees are built for RF. Root Mean Square Error (RMSE) has lifted the to 3173.213 in comparison with other predictors. The quality of the developed ensemble for Root Mean Squared Error metric is 1.47 better than for the best weak predictor (SVR).

  • Research Article
  • 10.3390/civileng6020021
Explainable Machine Learning to Predict the Construction Cost of Power Plant Based on Random Forest and Shapley Method
  • Apr 5, 2025
  • CivilEng
  • Suha Falih Mahdi Alazawy + 5 more

This study aims to develop a reliable method for predicting power plant construction costs during the early planning stages using ensemble machine learning techniques. Accurate cost predictions are essential for project feasibility, and this research highlights the strength of ensemble methods in improving prediction accuracy by combining the advantages of multiple models, offering a significant improvement over traditional approaches. This investigation employed the Random Forest (RF) algorithm to estimate the overall construction cost of a power plant. The RF algorithm was contrasted with single-learner machine learning models: Support Vector Regression (SVR) and k-Nearest Neighbors (KNN). Performance measures, comprising the coefficient of determination (R2), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE), were used to evaluate and contrast the performance of the implemented models. Statistical measures demonstrated that the RF approach surpassed alternative models, demonstrating the highest coefficient of determination for testing (R2=0.956) and the lowest Root Mean Square Error (RMSE = 29.27) for the testing dataset. The Shapley Additive Explanation (SHAP) technique was implemented to explain the significance and impact of predictor factors affecting power plant construction costs. The outcomes of this investigation provide crucial information for project decision-makers, allowing them to reduce discrepancies in projected costs and make informed decisions at the beginning of the construction phase.

  • Research Article
  • Cite Count Icon 6
  • 10.1159/000497424
Prediction Model of Cardiac Risk for Dental Extraction in Elderly Patients with Cardiovascular Diseases
  • May 2, 2019
  • Gerontology
  • Min Tang + 5 more

Background: With the rapidly increasing population of elderly people, dental extraction in elderly individuals with cardiovascular diseases (CVDs) has become quite common. The issue of how to assure the safety of elderly patients with CVDs undergoing dental extraction has perplexed dentists and internists for many years. And it is important to derive an appropriate risk prediction tool for this population. Objectives: The aim of this retrospective, observational study was to establish and validate a prediction model based on the random forest (RF) algorithm for the risk of cardiac complications of dental extraction in elderly patients with CVDs. Methods: Between August 2017 and May 2018, a total of 603 patients who fulfilled the inclusion criteria were used to create a training set. An independent test set contained 230 patients between June 2018 and July 2018. Data regarding clinical parameters, laboratory tests, clinical examinations before dental extraction, and 1-week follow-up were retrieved. Predictors were identified by using logistic regression (LR) with penalized LASSO (least absolute shrinkage and selection operator) variable selection. Then, a prediction model was constructed based on the RF algorithm by using a 5-fold cross-validation method. Results: The training set, based on 603 participants, including 282 men and 321 women, had an average participant age of 72.38 ± 8.31 years. Using feature selection methods, 11 predictors for risk of cardiac complications were screened out. When the RF model was constructed, its overall classification accuracy was 0.82 at the optimal cutoff value of 18.5%. In comparison to the LR model, the RF model showed a superior predictive performance. The AUROC (area under the receiver operating characteristic curve) scores of the RF and LR models were 0.83 and 0.80, respectively, in the independent test set. The AUPRC (area under the precision-recall curve) scores of the RF and LR models were 0.56 and 0.35, respectively, in the independent test set. Conclusion: The RF-based prediction model is expected to be applicable for preoperative clinical assessment for preventing cardiac complications in elderly patients with CVDs undergoing dental extraction. The findings may aid physicians and dentists in making more informed recommendations to prevent cardiac complications in this patient population.

  • Conference Article
  • 10.1109/ichve53725.2022.9961606
On-line vacuum degree monitoring of vacuum circuit breaker based on laser-induced breakdown spectroscopy combined with random forest algorithm
  • Sep 25, 2022
  • Feilong Zhang + 5 more

As essential switchgear in power systems, the vacuum circuit breaker is increasingly widely used in medium voltage fields with its strong arc extinguishing ability, no pollution, and compact structure. The traditional offline vacuum detection means cannot meet the demand of vacuum breaker reliability monitoring in actual engineering. In this study, we propose a method for on-line vacuum monitoring of vacuum circuit breakers based on laser-induced breakdown spectroscopy (LIBS) combined with variable importance random forest (VI-RF) algorithm. The experiments use the LIBS platform to collect the spectral data of the target in the vacuum cavity under nine different pressure conditions from 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-3</sup> Pa to 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5</sup> Pa. Then select four elements' spectral lines in the target material and ambient gas from the original data as the data set. Establish the random forest model for calculating the pressure level with the spectral data. Import the third one of data to the RF model for training and the other two-thirds for testing. Adopt different data preprocessing methods (standard normalization, multivariate scattering correction, first-order derivative, wavelet transform) to increase the classification ability of the model. We use correlation coefficient (R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) and root mean square error (RMSE) as evaluation indexes. The effects of the number of decision trees (n <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">tree</inf> ) and the number of features to be selected (m <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">try</inf> ) on the accuracy of air pressure estimation are investigated. The results show that the accuracy of this method exceeds 99% and can reach the accuracy of traditional vacuum monitoring technology, which proves that LIBS technology combined with the VI-RF algorithm is a new method that can be used for on-line vacuum monitoring of vacuum circuit breakers.

  • Conference Article
  • Cite Count Icon 4
  • 10.2118/212044-ms
Random Forest Ensemble Model for Reservoir Fluid Property Prediction
  • Aug 1, 2022
  • Yisa Adeeyo

Reservoir fluid PVT properties are measured in the laboratory for various use in reservoir engineering evaluation and estimation. Despite the indispensability of these PVT parameters, PVT lab data are seldomly available and if available may be unreliable. Instead, various empirical models have been developed and used in the industry. These empirical models are inherently inaccurate when used to predict PVT properties of fluid from different geological region with different depositional environment and fingerprint. Artificial Intelligence (AI) has evolved over the years and provided some algorithms with potentials to develop accurate predictive model for the prediction of bubblepoint pressure. This work tested some AI algorithms, compared performances and choose random forest regression algorithm in developing a robust predictive model for the estimation of bubblepoint pressure. Two thousand five hundred and twenty-two datasets obtained from oil reservoirs in different geographical locations were used for the feature scaling of input data, training and testing of the models. The independent variables, gas-oil ratio, temperature, oil density and gas density were confirmed to have key influence on the dependent variable Bubblepoint pressure The random forest model developed uses ensemble learning approach, combines predictions from multiple machine learning algorithms by averaging all predictions to make a more accurate prediction. The ‘forest’ generated by the random forest algorithm was trained through bootstrap aggregating. This is an ensemble meta-algorithm that improves the accuracy of machine learning algorithms. Percentage data split was 70% training and 30% testing. The reliability, accuracy and completeness of the predictive model capability were computed through performance indices such as the root mean square error (RMSE) and mean absolute error (MAE). The best network architecture was determined along with the corresponding test set RMSE, and Correlation coefficient. Statistical and graphical error analysis of the results showed that the random forest model performed better than existing models with 0.98 correlation coefficients for bubblepoint pressure. Better accuracy of reservoir properties prediction could be achieved using this random forest reservoir fluid properties prediction model.

  • Research Article
  • Cite Count Icon 31
  • 10.1016/j.compag.2021.106063
Empirical model for forecasting sugarcane yield on a local scale in Brazil using Landsat imagery and random forest algorithm
  • Mar 30, 2021
  • Computers and Electronics in Agriculture
  • Ana Cláudia Dos Santos Luciano + 5 more

Empirical model for forecasting sugarcane yield on a local scale in Brazil using Landsat imagery and random forest algorithm

  • Research Article
  • Cite Count Icon 6
  • 10.1093/forestry/cpac036
Estimation of aboveground carbon stock using Sentinel-2A data and Random Forest algorithm in scrub forests of the Salt Range, Pakistan
  • Sep 11, 2022
  • Forestry: An International Journal of Forest Research
  • Sobia Bhatti + 3 more

Forest ecosystems play a vital role in the global carbon cycle as forests store ~283 Gt of carbon globally and hence help mitigate climate change. Carbon stock estimation is the key step for assessing the mitigation potential of a given forest. About 5–10 Gt CO2 equivalent emissions come from deforestation and forest degradation annually. Pakistan’s forest resources are currently deteriorating due to deforestation and degradation and resulting in sourcing carbon dioxide emissions. One forest type that has been examined little so far in this context is subtropical scrub forests. This research suggests a workflow to estimate the carbon stock from three carbon pools (aboveground, belowground and litter) in scrub forests of the Salt Range, Pakistan by incorporating remote sensing and geographic information system techniques. The study’s objectives include the estimation of biomass and carbon stocks by using field inventory data and allometric equations, quantifying CO2 sequestration by using the ‘IPCC 2006 Guidelines for National Greenhouse Gas Inventories’ and finally map biomass and carbon by utilizing satellite imagery and statistical analysis. For prediction and mapping of biomass and carbon, field plots data along with vegetation indices and spectral bands of the Sentinel-2A satellite imagery were fed into a Random Forest (RF) algorithm in the cloud computing Google Earth Engine platform. Our results of ground data suggest that the examined scrub forests harbour 243 917 t of biomass, 114 989 t of carbon and 422 009 t of CO2 equivalent in the three carbon pools of the study area with a mean biomass density of 12.04 t ha−1 (±5.31) and mean carbon density of 5.72 t ha−1 (±2.46). The RF model showed good performance with reasonable R2 (0.53) and root mean square error (3.64 t ha−1) values and predicted average biomass at 13.93 t ha−1 (±4.35) and mean carbon density of 6.55 t ha−1 (±2.05). The total predicted and field-measured biomass has a plausible difference in values while the mean values have a minimal difference. The red-edge region and short-wave infrared (SWIR) region of the Sentinel-2A spectrum showed a strong relationship with aboveground biomass estimates from the field. We conclude that the combination of Sentinel-2A data coupled with ground data is a cost-effective and reliable tool to estimate various carbon pools in the scrub forests at a regional scale and may contribute to formulate policies to manage forests sustainably, enhance forest cover and conserve biodiversity.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.