Machine Learning Data Imputation and Prediction of Foraging Group Size in a Kleptoparasitic Spider

Yong-Chao Su,Bo-Sheng Li,Cheng-Hong Yang,Yu-Da Lin,Cheng-Yu Wu,Sin-Hua Moi

doi:10.3390/math9040415

Abstract

Cost–benefit analysis is widely used to elucidate the association between foraging group size and resource size. Despite advances in the development of theoretical frameworks, however, the empirical systems used for testing are hindered by the vagaries of field surveys and incomplete data. This study developed the three approaches to data imputation based on machine learning (ML) algorithms with the aim of rescuing valuable field data. Using 163 host spider webs (132 complete data and 31 incomplete data), our results indicated that the data imputation based on random forest algorithm outperformed classification and regression trees, the k-nearest neighbor, and other conventional approaches (Wilcoxon signed-rank test and correlation difference have p-value from < 0.001–0.030). We then used rescued data based on a natural system involving kleptoparasitic spiders from Taiwan and Vietnam (Argyrodes miniaceus, Theridiidae) to test the occurrence and group size of kleptoparasites in natural populations. Our partial least-squares path modelling (PLS-PM) results demonstrated that the size of the host web (T = 6.890, p = 0.000) is a significant feature affecting group size. The resource size (T = 2.590, p = 0.010) and the microclimate (T = 3.230, p = 0.001) are significant features affecting the presence of kleptoparasites. The test of conformation of group size distribution to the ideal free distribution (IFD) model revealed that predictions pertaining to per-capita resource size were underestimated (bootstrap resampling mean slopes <IFD predicted slopes, p < 0.001). These findings highlight the importance of applying appropriate ML methods to the handling of missing field data.

Highlights

The proposed data imputation method using the random forest (RF) approach could be of considerable benefit to field data imputation method using the RF approach could be of considerable benefit to field ecologists examining the relationship between population dispersion and environmental ecologists examining the relationship between population dispersion and environmental factors
We analyzed the performance of MEAN, ZERO, k-nearest neighbor (KNN), classification and regression trees (CART), and RF methods in terms of performance of data imputation, time, and space complexities
After data imputation using the proposed machine learning (ML)-based data imputation processes, we determined that the RF method is superior to other imputation methods in dealing with continuous and discrete data, noise, and outlier data

Summary

Introduction

It is common to see multiple conspecific individuals foraging in a single resource patch. The simplest explanation for this phenomenon is the gathering of individuals in the vicinity of resources that are patchily distributed [1]. Theoretical models of foraging behavior have been developed to facilitate the prediction of the foraging group size [2]. Field ecologists frequently conduct ecological surveys of resource utility strategies in natural populations; the data they bring back are often incomplete due to instrument failure, human error, or weather conditions [3]. Researchers require a large number of samples to reveal features that could be used to predict the number of individuals in a resource patch and to assess the fitness costs and benefits of remaining within a group of conspecifics. The data used for this cost–benefit analysis must first undergo pre-processing to compensate for missing values and eliminate noise [4]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematics	Publication Date: Feb 20, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Machine Learning Data Imputation and Prediction of Foraging Group Size in a Kleptoparasitic Spider

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Similar Papers

Machine learning in pain research.
Jörn Lötsch ... Alfred Ultsch
Pain | VOL. 159
Jörn Lötsch, et. al.Jörn Lötsch ... Alfred Ultsch
24 Nov 2017
Pain | VOL. 159

Density-dependent habitat selection in migratory passerines during stopover: what causes the deviation from IFD?
E Shochat ... B Pinshow
Evolutionary Ecology | VOL. 16
E Shochat, et. al.E Shochat ... B Pinshow
01 Sep 2002
Evolutionary Ecology | VOL. 16

Epidemiology and optimal foraging: modelling the ideal free distribution of insect vectors.
D W Kelly ... C E Thompson
Parasitology | VOL. 120 ( Pt 3)
D W Kelly, et. al.D W Kelly ... C E Thompson
01 Mar 2000
Parasitology | VOL. 120 ( Pt 3)

Putting competition strategies into ideal free distribution models: Habitat selection as a tug of war
Samuel M Flaxman ... H Kern Reeve
Journal of Theoretical Biology | VOL. 243
Samuel M Flaxman, et. al.Samuel M Flaxman ... H Kern Reeve
21 Jul 2006
Journal of Theoretical Biology | VOL. 243

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine Learning Data Imputation and Prediction of Foraging Group Size in a Kleptoparasitic Spider

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics