Abstract

Training datasets are a crucial component of any machine learning approach, with significant human effort spent creating and curating these for specific applications. However, a historical absence of standards has resulted in inconsistent and heterogeneous training datasets with limited discoverability and interoperability. Therefore, there is a need for best practices and guidelines for generating, structuring, describing, and curating training datasets.The Open Geospatial Consortium (OGC) Testbed-18 initiative covered several topics related to geospatial data, focussing on issues around cataloguing and interoperability. Within Testbed-18, the Machine Learning Training Datasets task aimed to develop a foundation for future standardization of training datasets for Earth observation applications.For this task, members from Pixalytics, FrontierSI, and Curtin University authored an Engineering Report that reviewed:·       Examples of how training datasets have been used in Earth observation applications·       The current best-practice methods for documenting training datasets·       The various requirements for training dataset metadata·       How the Findability, Accessibility, Interoperability, and Reuse (FAIR) principles apply to training datasetsThe Engineering Report provides a foundation that OGC can leverage in creating the future standard for machine learning training data for Earth observation applications. The Engineering Report also provides a useful overview of the state of work and key considerations for anyone wishing to improve how they document their training datasets.In our presentation, we discuss the key findings from the Engineering Report, including key metadata identified from Earth observation use cases, the current state of the art, thoughts on cataloguing and describing training data quality, and how the FAIR principles apply to training data. 

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call