Number Of Candidate Variables Research Articles

Abstract. In spite of the great abundance and ecological importance of headwater streams, managers are usually limited by a lack of information about water chemistry in these headwaters. In this study we test whether river outlet chemistry can be used as an additional source of information to improve the prediction of the chemistry of upstream headwaters (size < 2 km2), relative to models based on map information alone. We use the concentration of total organic carbon (TOC), an important stream ecosystem parameter, as the target for our study. Between 2000 and 2008, we carried out 17 synoptic surveys in 9 mesoscale catchments (size 32–235 km2). Over 900 water samples were collected in total, primarily from headwater streams but also including each catchment's river outlet during every survey. First we used partial least square regression (PLS) to model the distribution (median, interquartile range (IQR)) of headwater stream TOC for a given catchment, based on a large number of candidate variables including sub-catchment characteristics from GIS, and measured river chemistry at the catchment outlet. The best candidate variables from the PLS models were then used in hierarchical linear mixed models (MM) to model TOC in individual headwater streams. Three predictor variables were consistently selected for the MM calibration sets: (1) proportion of forested wetlands in the sub-catchment (positively correlated with headwater stream TOC), (2) proportion of lake surface cover in the sub-catchment (negatively correlated with headwater stream TOC), and (3) river outlet TOC (positively correlated with headwater stream TOC). Including river outlet TOC improved predictions, with 5–15 % lower prediction errors than when using map information alone. Thus, data on water chemistry measured at river outlets offer information which can complement GIS-based modelling of headwater stream chemistry.

Read full abstract

BackgroundAs public awareness of consequences of environmental exposures has grown, estimating the adverse health effects due to simultaneous exposure to multiple pollutants is an important topic to explore. The challenges of evaluating the health impacts of environmental factors in a multipollutant model include, but are not limited to: identification of the most critical components of the pollutant mixture, examination of potential interaction effects, and attribution of health effects to individual pollutants in the presence of multicollinearity.MethodsIn this paper, we reviewed five methods available in the statistical literature that are potentially helpful for constructing multipollutant models. We conducted a simulation study and presented two data examples to assess the performance of these methods on feature selection, effect estimation and interaction identification using both cross-sectional and time-series designs. We also proposed and evaluated a two-step strategy employing an initial screening by a tree-based method followed by further dimension reduction/variable selection by the aforementioned five approaches at the second step.ResultsAmong the five methods, least absolute shrinkage and selection operator regression performs well in general for identifying important exposures, but will yield biased estimates and slightly larger model dimension given many correlated candidate exposures and modest sample size. Bayesian model averaging, and supervised principal component analysis are also useful in variable selection when there is a moderately strong exposure-response association. Substantial improvements on reducing model dimension and identifying important variables have been observed for all the five statistical methods using the two-step modeling strategy when the number of candidate variables is large.ConclusionsThere is no uniform dominance of one method across all simulation scenarios and all criteria. The performances differ according to the nature of the response variable, the sample size, the number of pollutants involved, and the strength of exposure-response association/interaction. However, the two-step modeling strategy proposed here is potentially applicable under a multipollutant framework with many covariates by taking advantage of both the screening feature of an initial tree-based method and dimension reduction/variable selection property of the subsequent method. The choice of the method should also depend on the goal of the study: risk prediction, effect estimation or screening for important predictors and their interactions.

Read full abstract

Number Of Candidate Variables Research Articles

Articles published on Number Of Candidate Variables

The predictability of cross-sectional returns in high frequency

Machine learning methods for \u201cwicked\u201d problems: exploring the complex drivers of modern slavery

Klasifikasi Penentuan Jenis Obat Menggunakan Algoritma Decision Tree

On model selection from a finite family of possibly misspecified time series models

Regularization parameter selection for penalized empirical likelihood estimator

Mode jumping MCMC for Bayesian variable selection in GLMM

Variable selection - A review and recommendations for the practicing statistician.

Regularization Parameter Selection for Penalized Empirical Likelihood Estimator

Handling co-dependence issues in resampling-based variable selection procedures: a simulation study

Investigation of powered 2-wheeler accident involvement in urban arterials by considering real-time traffic and weather data

Tree-Based Conditional Portfolio Sorts: The Relation between Past and Future Stock Returns

Map-based prediction of organic carbon in headwater streams improved by downstream observations from the river outlet

Variable selection for modeling the absolute magnitude at maximum of Type Ia supernovae

Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons.

Gaining insight with recursive partitioning of generalized linear models

The benefit of data-based model complexity selection via prediction error curves in time-to-event data

Investigation of habitat preferences of Iranian jerboa (Allactaga firouzi Womochel 1978)

Gibbs posterior for variable selection in high-dimensional classification and data mining

Multi-step methods for choosing the best set of variables in regression analysis

Generalization error for multi-class margin classification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Number Of Candidate Variables Research Articles

Articles published on Number Of Candidate Variables

The predictability of cross-sectional returns in high frequency

Machine learning methods for \u201cwicked\u201d problems: exploring the complex drivers of modern slavery

Klasifikasi Penentuan Jenis Obat Menggunakan Algoritma Decision Tree

On model selection from a finite family of possibly misspecified time series models

Regularization parameter selection for penalized empirical likelihood estimator

Mode jumping MCMC for Bayesian variable selection in GLMM

Variable selection - A review and recommendations for the practicing statistician.

Regularization Parameter Selection for Penalized Empirical Likelihood Estimator

Handling co-dependence issues in resampling-based variable selection procedures: a simulation study

Investigation of powered 2-wheeler accident involvement in urban arterials by considering real-time traffic and weather data

Tree-Based Conditional Portfolio Sorts: The Relation between Past and Future Stock Returns

Map-based prediction of organic carbon in headwater streams improved by downstream observations from the river outlet

Variable selection for modeling the absolute magnitude at maximum of Type Ia supernovae

Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons.

Gaining insight with recursive partitioning of generalized linear models

The benefit of data-based model complexity selection via prediction error curves in time-to-event data

Investigation of habitat preferences of Iranian jerboa (Allactaga firouzi Womochel 1978)

Gibbs posterior for variable selection in high-dimensional classification and data mining

Multi-step methods for choosing the best set of variables in regression analysis

Generalization error for multi-class margin classification