With the rapid expansion of microplastic research and reliance on semantic descriptors, there is an increasing need for plastic pollution data harmonization. Data standards have been developed but are seldom implemented across research sectors, geographic regions, environmental media, or size classes of plastic pollution. Harmonization of existing data is currently hindered by increasingly large datasets using thousands of different categorical variable descriptors, as well as various metrics used to describe particle abundance and differing size ranges studied across groups. For this study, we used manually developed relational databases to build an algorithm utilizing artificial intelligence capable of automatically curating harmonized, more usable datasets describing micro to macro plastic pollution in the environment. The study algorithm MaTCH (microplastics and trash cleaning and harmonization) can harmonize datasets with different formats, nomenclature, methods, and measured particle characteristics with an accuracy of 71-94% when matching semantically. All other non-semantic corrections are reported within a 95% confidence interval and with model uncertainty. All steps of the algorithm are integrated in an open-source software tool for the benefit of the scientific community and ease of integration for all plastic pollution data.
Read full abstract