Improved Training for Machine Learning: The Additional Potential of Innovative Algorithmic Approaches.

Octavian Dumitru,Zhongling Huang,Dongyang Ao,Mihai Datcu,Mila Stillman,Gottfried Schwarz

doi:10.5194/egusphere-egu21-4683

Abstract

&lt;p&gt;During the last years, much progress has been reached with machine learning algorithms. Among the typical application fields of machine learning are many technical and commercial applications as well as Earth science analyses, where most often indirect and distorted detector data have to be converted to well-calibrated scientific data that are a prerequisite for a correct understanding of the desired physical quantities and their relationships.&lt;/p&gt;&lt;p&gt;However, the provision of sufficient calibrated data is not enough for the testing, training, and routine processing of most machine learning applications. In principle, one also needs a clear strategy for the selection of necessary and useful training data and an easily understandable quality control of the finally desired parameters.&lt;/p&gt;&lt;p&gt;At a first glance, one could guess that this problem could be solved by a careful selection of representative test data covering many typical cases as well as some counterexamples. Then these test data can be used for the training of the internal parameters of a machine learning application. At a second glance, however, many researchers found out that a simple stacking up of plain examples is not the best choice for many scientific applications.&lt;/p&gt;&lt;p&gt;To get improved machine learning results, we concentrated on the analysis of satellite images depicting the Earth&amp;#8217;s surface under various conditions such as the selected instrument type, spectral bands, and spatial resolution. In our case, such data are routinely provided by the freely accessible European Sentinel satellite products (e.g., Sentinel-1, and Sentinel-2). Our basic work then included investigations of how some additional processing steps &amp;#8211; to be linked with the selected training data &amp;#8211; can provide better machine learning results.&lt;/p&gt;&lt;p&gt;To this end, we analysed and compared three different approaches to find out machine learning strategies for the joint selection and processing of training data for our Earth observation images:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;One can optimize the training data selection by adapting the data selection to the specific instrument, target, and application characteristics [1].&lt;/li&gt; &lt;li&gt;As an alternative, one can dynamically generate new training parameters by Generative Adversarial Networks. This is comparable to the role of a sparring partner in boxing [2].&lt;/li&gt; &lt;li&gt;One can also use a hybrid semi-supervised approach for Synthetic Aperture Radar images with limited labelled data. The method is split in: polarimetric scattering classification, topic modelling for scattering labels, unsupervised constraint learning, and supervised label prediction with constraints [3].&lt;/li&gt; &lt;/ul&gt;&lt;p&gt;We applied these strategies in the ExtremeEarth sea-ice monitoring project (http://earthanalytics.eu/). As a result, we can demonstrate for which application cases these three strategies will provide a promising alternative to a simple conventional selection of available training data.&lt;/p&gt;&lt;p&gt;[1] C.O. Dumitru et. al, &amp;#8220;Understanding Satellite Images: A Data Mining Module for Sentinel Images&amp;#8221;, Big Earth Data, 2020, 4(4), pp. 367-408.&lt;/p&gt;&lt;p&gt;[2] D. Ao et. al., &amp;#8220;Dialectical GAN for SAR Image Translation: From Sentinel-1 to TerraSAR-X&amp;#8221;, Remote Sensing, 2018, 10(10), pp. 1-23.&lt;/p&gt;&lt;p&gt;[3] Z. Huang, et. al., &quot;HDEC-TFA: An Unsupervised Learning Approach for Discovering Physical Scattering Properties of Single-Polarized SAR Images&quot;, IEEE Transactions on Geoscience and Remote Sensing, 2020, pp.1-18.&lt;/p&gt;

Full Text