Abstract
Cost–benefit analysis is widely used to elucidate the association between foraging group size and resource size. Despite advances in the development of theoretical frameworks, however, the empirical systems used for testing are hindered by the vagaries of field surveys and incomplete data. This study developed the three approaches to data imputation based on machine learning (ML) algorithms with the aim of rescuing valuable field data. Using 163 host spider webs (132 complete data and 31 incomplete data), our results indicated that the data imputation based on random forest algorithm outperformed classification and regression trees, the k-nearest neighbor, and other conventional approaches (Wilcoxon signed-rank test and correlation difference have p-value from < 0.001–0.030). We then used rescued data based on a natural system involving kleptoparasitic spiders from Taiwan and Vietnam (Argyrodes miniaceus, Theridiidae) to test the occurrence and group size of kleptoparasites in natural populations. Our partial least-squares path modelling (PLS-PM) results demonstrated that the size of the host web (T = 6.890, p = 0.000) is a significant feature affecting group size. The resource size (T = 2.590, p = 0.010) and the microclimate (T = 3.230, p = 0.001) are significant features affecting the presence of kleptoparasites. The test of conformation of group size distribution to the ideal free distribution (IFD) model revealed that predictions pertaining to per-capita resource size were underestimated (bootstrap resampling mean slopes <IFD predicted slopes, p < 0.001). These findings highlight the importance of applying appropriate ML methods to the handling of missing field data.
Highlights
The proposed data imputation method using the random forest (RF) approach could be of considerable benefit to field data imputation method using the RF approach could be of considerable benefit to field ecologists examining the relationship between population dispersion and environmental ecologists examining the relationship between population dispersion and environmental factors
We analyzed the performance of MEAN, ZERO, k-nearest neighbor (KNN), classification and regression trees (CART), and RF methods in terms of performance of data imputation, time, and space complexities
After data imputation using the proposed machine learning (ML)-based data imputation processes, we determined that the RF method is superior to other imputation methods in dealing with continuous and discrete data, noise, and outlier data
Summary
It is common to see multiple conspecific individuals foraging in a single resource patch. The simplest explanation for this phenomenon is the gathering of individuals in the vicinity of resources that are patchily distributed [1]. Theoretical models of foraging behavior have been developed to facilitate the prediction of the foraging group size [2]. Field ecologists frequently conduct ecological surveys of resource utility strategies in natural populations; the data they bring back are often incomplete due to instrument failure, human error, or weather conditions [3]. Researchers require a large number of samples to reveal features that could be used to predict the number of individuals in a resource patch and to assess the fitness costs and benefits of remaining within a group of conspecifics. The data used for this cost–benefit analysis must first undergo pre-processing to compensate for missing values and eliminate noise [4]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.