This article discusses the content and results of the work devoted to the development of a machine learning model that allows for data incompleteness recovery using cloud computing. The problem is considered using the example of a study devoted to data modeling to fill in missing values of vegetation indices based on open data catalogs of cloud computing platforms. The proposed methodology is based on the use of a multi-year periodic sampling of vegetation index values and model training on large amounts of data to improve the quality of series reconstruction. The approach indicated in the work allows for higher accuracy than using classical interpolation methods for data recovery, which makes the modeled values suitable for use in solving various practical problems. The proposed method is implemented using the example of restoring the values of the Normalized Difference Vegetation Index used for monitoring and evaluating the state of vegetation cover. Arrays of values obtained from the catalogs of the Google Earth Engine cloud environment intended for processing and analyzing data from remote sensing of the Earth (on the territory of the central part of the Novgorod Region) were used as initial data. To accelerate the learning process of the model and increase efficiency and productivity, the capabilities of the Google Colaboratory platform were used, which made it possible not to use local computing capacity and do not use specialized software in the study. This approach can be adapted to reconstruct other indexes or resolve data incompleteness in various subject areas, which emphasizes its versatility and potential practical application.
Read full abstract