Abstract
Classical supervised methods like linear regression and decision trees are not completely adapted for identifying impacting factors on a response variable corresponding to zero-inflated proportion data (ZIPD) that are dependent, continuous and bounded. In this article we propose a within-block permutation-based methodology to identify factors (discrete or continuous) that are significantly correlated with ZIPD, we propose a performance indicator quantifying the percentage of correlation explained by the subset of significant factors, and we show how to predict the ranks of the response variables conditionally on the observation of these factors. The methodology is illustrated on simulated data and on two real data sets dealing with epidemiology. In the first data set, ZIPD correspond to probabilities of transmission of Influenza between horses. In the second data set, ZIPD correspond to probabilities that geographic entities (eg, states and countries) have the same COVID-19 mortality dynamics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.