Adaptive Non-Hermitian GFDM for Indoor VLC Using Random Forest Regression
Adaptive Non-Hermitian GFDM for Indoor VLC Using Random Forest Regression
- Research Article
- 10.1016/j.rsma.2024.103655
- Jun 29, 2024
- Regional Studies in Marine Science
Mapping benthic sediment types and composition in a turbid Jamaican bay using hydroacoustic data and different spatially explicit interpolation techniques
- Research Article
7
- 10.1109/access.2020.3027828
- Jan 1, 2020
- IEEE Access
Quantitative structure-activity relationship (QSAR) regression models are mathematical ones which relate the structural properties of chemicals to the potencies of the biological activities of the chemicals. In QSAR models, the physical and chemical information of the molecules is encoded into quantitative numbers called descriptors. Recently, experimental test results (profiles) have been used as descriptors of chemicals. Profile QSAR 2.0 (pQSAR) model suggested by Martin et al., is a multitask, two step machine learning prediction method with a combination of random forest regressions (RFRs) and partial least squares regression (PLSR). In pQSAR model, one fills the profile table's missing values with RFRs and then builds PLSR using the profile predictions. Note that in the second step of the pQSAR method, PLSR's predictor variables are profiles; so activity values, and the response variables are also activity values. Thus we can use the PLSRs to update the profile table and then repeat the second step. In this work, we propose an extended model of pQSAR generated by RFRs and PLSRs. Experiment of updating the given full initially predicted profile table by two kinds of prediction models, RFRs and PLSRs, has been conducted iteratively for the PKIS and ChEMBL data sets. Even though prediction performance of individual combination of RFRs and PLSRs varies, the average of the all possible predicted profile tables for given iteration shows better performance. This ensemble model has better prediction performance in sense of Pearson's R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> compared to that of the pQSAR model.
- Research Article
2
- 10.5194/tc-19-37-2025
- Jan 8, 2025
- The Cryosphere
Abstract. Firn density plays a crucial role in assessing the surface mass balance of the Antarctic ice sheet. However, our understanding of the spatial and temporal variations in firn density is limited due to (i) spatial and temporal limitations of in situ measurements, (ii) potential modelling uncertainties, and (iii) lack of firn density products driven by satellite remote-sensing data. To address this gap, this paper explores the potential of satellite microwave radiometer (Special Sensor Microwave Imager/Sounder (SSMIS)) and scatterometer (Advanced Scatterometer (ASCAT)) observations for assessing spatial and temporal dynamics of dry-firn density over the Antarctic ice sheet. Our analysis demonstrates a clear relation between density anomalies at a depth of 40 cm and fluctuations in satellite observations. However, a linear relationship with individual satellite observations is insufficient to explain the spatial and temporal variation in snow density. Hence, we investigate the potential of a non-linear random forest (RF) machine learning approach trained on radiometer and scatterometer data to derive the spatial and temporal variations in dry-firn density. In the estimation process, 10 years of SSMIS observations (brightness temperature) and ASCAT observations (backscatter intensity) is used as input features to a random forest (RF) regressor. The regressor is first trained on time series of modelled density and satellite observations at randomly sampled pixels and then applied to estimate densities in dry-firn areas across Antarctica. The RF results reveal a strong agreement between the spatial patterns estimated by the RF regressor and the modelled densities. The estimated densities exhibit an error of ±10 kg m−3 in the interior of the ice sheet and ±35 kg m−3 towards the ocean. However, the temporal patterns show some discrepancies, as the RF regressor tends to overestimate summer densities, except for high-elevation regions in East Antarctica and specific areas in West Antarctica. These errors may be attributed to underestimations of short-term or seasonal variations in the modelled density and the limitations of RF in extrapolating values outside the training data. Overall, our study presents a potential method for estimating unknown Antarctic firn densities using known densities and satellite parameters.
- Research Article
20
- 10.2196/medinform.5650
- Jul 21, 2016
- JMIR Medical Informatics
Background: Modeling patient flow is crucial in understanding resource demand and prioritization. We study patient outflow from an open ward in an Australian hospital, where currently bed allocation is carried out by a manager relying on past experiences and looking at demand. Automatic methods that provide a reasonable estimate of total next-day discharges can aid in efficient bed management. The challenges in building such methods lie in dealing with large amounts of discharge noise introduced by the nonlinear nature of hospital procedures, and the nonavailability of real-time clinical information in wards.ObjectiveOur study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data.MethodsWe compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features.ResultsOur data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7% improvement in mean absolute error, for all days in the year 2014.ConclusionsIn the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments.
- Research Article
18
- 10.1002/minf.201700078
- Nov 14, 2017
- Molecular Informatics
This paper presents novel QSAR models for the prediction of antitrypanosomal activity among thiazolidines and related heterocycles. The performance of four machine learning algorithms: Random Forest regression, Stochastic gradient boosting, Multivariate adaptive regression splines and Gaussian processes regression have been studied in order to reach better levels of predictivity. The results for Random Forest and Gaussian processes regression are comparable and outperform other studied methods. The preliminary descriptor selection with Boruta method improved the outcome of machine learning methods. The two novel QSAR-models developed with Random Forest and Gaussian processes regression algorithms have good predictive ability, which was proved by the external evaluation of the test set with corresponding Q2ext =0.812 and Q2ext =0.830. The obtained models can be used further for in silico screening of virtual libraries in the same chemical domain in order to find new antitrypanosomal agents. Thorough analysis of descriptors influence in the QSAR models and interpretation of their chemical meaning allows to highlight a number of structure-activity relationships. The presence of phenyl rings with electron-withdrawing atoms or groups in para-position, increased number of aromatic rings, high branching but short chains, high HOMO energy, and the introduction of 1-substituted 2-indolyl fragment into the molecular structure have been recognized as trypanocidal activity prerequisites.
- Research Article
252
- 10.3835/plantgenome2012.02.0001
- Jul 1, 2012
- The Plant Genome
Fusarium head blight (FHB) resistance is quantitative and diffi cult to evaluate. Genomic selection (GS) could accelerate FHB resistance breeding. We used U.S. cooperative FHB wheat nursery data to evaluate GS models for several FHB resistance traits including deoxynivalenol (DON) levels. For all traits we compared the models: ridge regression (RR), Bayesian LASSO (BL), reproducing kernel Hilbert spaces (RKHS) regression, random forest (RF) regression, and multiple linear regression (MLR) (fi xed effects). For DON, we evaluated additional prediction methods including bivariate RR models, phenotypes for correlated traits, and RF regression models combining markers and correlated phenotypes as predictors. Additionally, for all traits, we compared different marker sets including genomewide markers, FHB quantitative trait loci (QTL) targeted markers, and both sets combined. Genomic selection accuracies were always higher than MLR accuracies, RF and RKHS regression were often the most accurate methods, and for DON, marker plus trait RF regression was more accurate than all other methods. For all traits except DON, using QTL targeted markers alone led to lower accuracies than using genomewide markers. This study indicates that cooperative FHB nursery data can be useful for GS, and prior information about correlated traits and QTL could be used to improve accuracies in some cases.
- Preprint Article
- 10.5194/egusphere-gc8-hydro-67
- May 8, 2023
&lt;p&gt;Upscaling of soil water content (SWC) information towards&lt;strong&gt; &lt;/strong&gt;the large-scale (&gt;10 km) is highly desired to address the increasing demand on SWC products at various sectors. Random forest (RF) regression has been suggested as suitable method to generate large SWC maps from a limited amount of observations. RF deals with multiple prediction variables (predictors) to derive the missing values of a desired variable (e.g. SWC) based on their internal relationship. Cosmic ray neutron sensing (CRNS) is an alternative method for passive SWC mapping and monitoring, either by stationary CRNS sensors or by mobile CRNS roving. CRNS has a certain advantage over most classical hydrogeophysical approaches because of its footprint at the hectares-scale and beyond, particularly true for roving data, which qualifies CRNS data as suitable input for RF regressions. However, commonly CRNS roving data contain a high amount of noise and outlier values, related to the statistical distribution of neutron counting, which hinders the signal interpretation and could lower the quality of the RF regression performance. There are so far two ways to overcome the noise problem and to achieve a higher data stability; i) increasing of the aggregation time, which decreases the signal uncertainty but also reduces the spatial resolution and ii) applying smoothing algorithms, e.g. interpolation or moving averages, which results in more stable values, but it does not solve the outlier problem.&lt;/p&gt; &lt;p&gt;We used SWC data from CRNS roving along the Selketal catchment at the Harz mountain, Germany, to test the performance of a score criteria for an adaptive removal of potential outliers. The score criteria are internal test parameters, providing an indication about the probability of values that might be an outlier or not. Therefore, each observation was subject to a group of queries, asking its conformity to the surrounding values by selected statistical parameters. Based on the total score of the queries, the potentially unreliable observations were removed using various thresholds and used as input for the RF regression. RF regression was performed using static (e.g. topographical indices, soil properties) and dynamic (precipitation) predictors generating SWC maps from an area of ~2700 km&amp;#178;. SWC input data were split into training (~2/3) and validation sets (~1/3).&lt;/p&gt; &lt;p&gt;Preliminary results showed that the application of the score criteria resulted in more stable spatial pattern and improved the R&amp;#178; from 0.099 to 0.196, 0.266 and 0.308 for score 6, 4 and 3, respectably. Achieved root mean squared error also decreased with stronger filtering, ranging from 0.14 for the original datasets to 0.078 for score 3. However, by using the score 3 threshold, 22.4% of the data were omitted. Hence, an optimization between the amount of excluded data and the resulting improvement of prediction needs to be developed and tested. The implementation of the spatial relationship in-between the observations and a weighting of the score values according to their importance should further increase the performance. Due to its easy application and its adjustable criteria selection, the proposed filtering approach has the potential to become more popular in CRNS roving studies.&lt;/p&gt;
- Research Article
47
- 10.1155/2020/2158573
- Apr 13, 2020
- Applied and Environmental Soil Science
Soil organic carbon constitutes an important indicator of soil fertility. The purpose of this study was to predict soil organic carbon content in the mountainous terrain of eastern Lesotho, southern Africa, which is an area of high endemic biodiversity as well as an area extensively used for small-scale agriculture. An integrated field and laboratory approach was undertaken, through measurements of reflectance spectra of soil using an Analytical Spectral Device (ASD) FieldSpec® 4 optical sensor. Soil spectra were collected on the land surface under field conditions and then on soil in the laboratory, in order to assess the accuracy of field spectroscopy-based models. The predictive performance of two different statistical models (random forest and partial least square regression) was compared. Results show that random forest regression can most accurately predict the soil organic carbon contents on an independent dataset using the field spectroscopy data. In contrast, the partial least square regression model overfits the calibration dataset. Important wavelengths to predict soil organic contents were localised around the visible range (400–700 nm). This study shows that soil organic carbon can be most accurately estimated using derivative field spectroscopy measurements and random forest regression.
- Research Article
- 10.55681/jige.v5i2.2794
- Jun 28, 2024
- Jurnal Ilmiah Global Education
Humans basically have a basic need to have a place to live, which can be a house or shelter. Along with the rapid population growth in Indonesia, which continues to increase every year, many people do not have or have a decent place to live. Therefore, careful planning is needed so that every family can have a decent home. One very important aspect in planning investment in the form of property is predicting future house prices. One approach that can be used is to use a Random Forest and Multiple Linear Regression algorithm, which is an algorithm from Machine Learning. There are several factors that can influence the price of a house, including land area, building area, number of bedrooms, bathrooms and garage. In this research, multiple linear regression and random forest regression methods were chosen. The aim of this research is to find the best prediction results between the two methods. To achieve accurate predictions, research was carried out repeatedly by dividing the dataset into 80% for training and 20% for testing. The research results show that the random forest regression algorithm provides the best results, with an accuracy of 81.6%.
- Research Article
165
- 10.3390/pr9112015
- Nov 11, 2021
- Processes
Non-traditional machining (NTM) has gained significant attention in the last decade due to its ability to machine conventionally hard-to-machine materials. However, NTMs suffer from several disadvantages such as higher initial cost, lower material removal rate, more power consumption, etc. NTMs involve several process parameters, the appropriate tweaking of which is necessary to obtain economical and suitable results. However, the costly and time-consuming nature of the NTMs makes it a tedious and expensive task to manually investigate the appropriate process parameters. The NTM process parameters and responses are often not linearly related and thus, conventional statistical tools might not be enough to derive functional knowledge. Thus, in this paper, three popular machine learning (ML) methods (viz. linear regression, random forest regression and AdaBoost regression) are employed to develop predictive models for NTM processes. By considering two high-fidelity datasets from the literature on electro-discharge machining and wire electro-discharge machining, case studies are shown in the paper for the effectiveness of the ML methods. Linear regression is observed to be insufficient in accurately mapping the complex relationship between the process parameters and responses. Both random forest regression and AdaBoost regression are found to be suitable for predictive modelling of NTMs. However, AdaBoost regression is recommended as it is found to be insensitive to the number of regressors and thus is more readily deployable.
- Research Article
7
- 10.22190/fume210728071b
- Dec 16, 2023
- Facta Universitatis, Series: Mechanical Engineering
In the present-day manufacturing environment, the modeling of a machining process with the help of statistical and machine learning techniques in order to understand the material removal mechanism and study the influences of the input parameters on the responses has become essential for cost optimization and effective resource utilization. In this paper, using a past CNC face milling dataset with 27 experimental observations, a random forest (RF) regressor is employed to effectively predict the response values of the said process for given sets of input parameters. The considered milling dataset consists of four input parameters, i.e. cutting speed, feed rate, depth of cut and width of cut, and three responses, i.e. material removal rate, surface roughness and active energy consumption. The RF regressor is an ensemble learning method where multiple decision trees are combined together to provide better prediction results with minimum variance and overfitting of data. Its prediction performance is validated using five statistical metrics, i.e. mean absolute percentage error, root mean squared percentage error, root mean squared logarithmic error, correlation coefficient and root relative squared error. It is observed that the RF regressor can be deployed as an effective prediction tool with minimum feature selection for any of the machining processes.
- Research Article
32
- 10.3847/1538-4357/ab2ece
- Aug 10, 2019
- The Astrophysical Journal
We explore the interrelationships between the galaxy group halo mass and various observable group properties. We propose a simple scenario that describes the evolution of the central galaxies and their host dark matter halos. Star formation quenching is one key process in this scenario, which leads to the different assembly histories of blue groups (group with a blue central) and red groups (group with a red central). For blue groups, both the central galaxy and the halo continue to grow their mass. For red groups, the central galaxy has been quenched and its stellar mass remains about constant, while its halo continues to grow by merging smaller halos. From this simple scenario, we speculate about the driving properties that should strongly correlate with the group halo mass. We then apply the machine learning algorithm the Random Forest (RF) regressor to blue groups and red groups separately in the semianalytical model L-GALAXIES to explore these nonlinear multicorrelations and to verify the scenario as proposed above. Remarkably, the results given by the RF regressor are fully consistent with the prediction from our simple scenario and hence provide strong support for it. As a consequence, the group halo mass can be more accurately determined from observable galaxy properties by the RF regressor with a 50% reduction in error. A halo mass more accurately determined in this way also enables more accurate investigations on the galaxy–halo connection and other important related issues, including galactic conformity and the effect of halo assembly bias on galaxy assembly.
- Research Article
1
- 10.1016/j.rineng.2025.108571
- Mar 1, 2026
- Results in Engineering
Machine learning-driven acoustic impedance inversion with globally optimized reservoir characterization for reserve estimation in carbonate reservoirs
- Research Article
- 10.1177/03019233251319062
- Jul 11, 2025
- Ironmaking & Steelmaking: Processes, Products and Applications
It is the key product quality requirements to simultaneously control endpoint temperature and composition of converter steelmaking process within design range. The oxidation removal reaction of carbon (C) and phosphorus (P) is a complex multiphase reaction with high-temperature, which is difficult to analyse its mechanism and realise modeling. According to the distribution of the sampled data, a deep probabilistic ensemble network is firstly designed to achieve quality prediction of the converter steelmaking process with small prediction error variance. The first network layer (level-0) consists of parallel random forest regression (RFR), extreme random forest (ERF) regression and extreme gradient boosting (XGBoost). The hyperparameters in level-0 are optimised by intelligent Bayesian optimisation algorithm. The stacked RFR structure can effectively reduce the problem of single RFR prediction deviation. The extreme randomness of ERF can enhance the performance of the stacked network to capture the diversity of data features. The parallel computing mechanism of decision tree in XGBoost can improve the training speed of stacked networks. The second network layer (level-1) uses linear regression (LR) to effectively integrate the output features to obtain the quality prediction value. Finally, by comparing with the existing networks and ablation study tests, the superiorities of the proposed network are verified by using production data from an actual steelmaking plant. The variance of the proposed network for the endpoint temperature, C and P are 73.244, 1.8 × 10 −5 and 4.8 × 10 −6 , respectively, and others evaluation indicators have also been significantly improved by only relying on small sample data.
- Research Article
1
- 10.21108/ijoict.v9i2.865
- Dec 29, 2023
- International Journal on Information and Communication Technology (IJoICT)
This research paper focuses on predicting the dispersion of carbon emissions, a crucial indicator for identifying potential forest fire hotspots in the wooded regions of Sumatra Island, Indonesia. Forest fires, often triggered by extended periods of dry weather, result in significant environmental degradation, impacting both the ecosystem and the economy. Furthermore, health concerns arise from smoke inhalation, leading to respiratory problems. To achieve this predictive capability, we harnessed valuable datasets, including GFED4.1s for carbon emissions and ERA5 for historical climate indicators, spanning from 1998 to 2022. Employing supervised learning ensemble methods, specifically Random Forest Regression (RFR) and Gradient Boosting Regression (GBR), we sought to forecast carbon emissions. It is noteworthy that our predictions encompassed carbon emission values from 1998 to 2023, providing insights into recent trends. Our analysis showed that GBR did better than RFR in terms of evaluation metrics, with a root mean square error (RMSE) of 10.87 and a mean absolute error (MAE) of 2.91. This was done by carefully tuning the hyperparameters. Additionally, our study highlighted that precipitation, temperature, and humidity were the primary climate factors influencing carbon emission values.