Abstract

<p>Machine learning (ML) models that are robust, efficient and exhibiting sound generalization capabilities rely on the assumption that they are trained with data that are independent and identically distributed (i.i.d). Violating this assumption may result in overfitting these highly flexible methods to the training data and underestimating spatial prediction errors. Making models appear more reliable than they are, could lead in a bias assessment of the model’s capability to generalize the learned relationship to independent data and consequently models with overall poor prediction accuracy.</p><p>Spatial data are special kind of data that the i.i.d. does not hold most of the times due to their spatial autocorrelation. Cross-validation is a very common resampling method both for the tuning of ML models and for the assessment of their predictive capabilities. Studies have shown that using random cross-validation methods with spatial data could produce overoptimistic results due to the violation of the i.i.d assumption. In order to mitigate this problem, spatial cross-validation is proposed alternatively that splits the data into spatially disjoint subsets, which are subsequently used for cross-validation.</p><p>In the context of the MEDSAL Project (www.medsal.net), multiple data of different covariates were collected in order to study groundwater salinization. Machine learning was applied to predict salinity concentration based on these data. In the current presentation some of the results of the ML analysis are shown along with the effect of the spatial autocorrelation in the ML models' prediction capabilities. This was implemented by comparing the prediction results of the ML models created with random cross-validation versus spatial cross-validation resampling methods. Possible spatial autocorrelation, along with time series autocorrelation, in water data are important issues that data analysts should study and address especially when pairing with ML analysis and modeling.</p>

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.