Abstract
ObjectivesIn epidemiological studies, it is important to identify independent associations between collective exposures and a health outcome. The current stepwise selection technique ignores stochastic errors and suffers from a lack of stability. The alternative LASSO-penalized regression model can be applied to detect significant predictors from a pool of candidate variables. However, this technique is prone to false positives and tends to create excessive biases. It remains challenging to develop robust variable selection methods and enhance predictability.Material and methodsTwo improved algorithms denoted the two-stage hybrid and bootstrap ranking procedures, both using a LASSO-type penalty, were developed for epidemiological association analysis. The performance of the proposed procedures and other methods including conventional LASSO, Bolasso, stepwise and stability selection models were evaluated using intensive simulation. In addition, methods were compared by using an empirical analysis based on large-scale survey data of hepatitis B infection-relevant factors among Guangdong residents.ResultsThe proposed procedures produced comparable or less biased selection results when compared to conventional variable selection models. In total, the two newly proposed procedures were stable with respect to various scenarios of simulation, demonstrating a higher power and a lower false positive rate during variable selection than the compared methods. In empirical analysis, the proposed procedures yielding a sparse set of hepatitis B infection-relevant factors gave the best predictive performance and showed that the procedures were able to select a more stringent set of factors. The individual history of hepatitis B vaccination, family and individual history of hepatitis B infection were associated with hepatitis B infection in the studied residents according to the proposed procedures.ConclusionsThe newly proposed procedures improve the identification of significant variables and enable us to derive a new insight into epidemiological association analysis.
Highlights
The two newly proposed procedures were stable with respect to various scenarios of simulation, demonstrating a higher power and a lower false positive rate during variable selection than the compared methods
The newly proposed procedures improve the identification of significant variables and enable us to derive a new insight into epidemiological association analysis
The variable selection technique is employed for epidemiologic analysis to identify independent associations between collective exposures and a health outcome [1]
Summary
The variable selection technique is employed for epidemiologic analysis to identify independent associations between collective exposures and a health outcome [1]. Automatic variable selection using stepwise regression is the most widely used method It is not always optimal when applied for identifying independent associations in large epidemiologic data sets with many predictors [6, 7]. The stepwise selection technique ignores stochastic errors inherited in the stages of variable selection and suffers from a lack of stability [11]. In this case, a model using univariate or non-penalized regression modeling approaches is likely to overfit the data and generates findings that will not generalize well when extended to new data. By shrinking variables with very unstable estimates towards zero, the LASSO model can effectively exclude some irrelevant variables and produce sparse estimations
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.