Abstract

This paper describes an innovative machine learning (ML) model to predict the performance of different metal oxide photocatalysts on a wide range of contaminants. The molecular structures of metal oxide photocatalysts are encoded with a crystal graph convolution neural network (CGCNN). The structure of organic compounds is encoded via digital molecular fingerprints (MF). The encoded features of the photocatalysts and contaminants are input to an artificial neural network (ANN), named as CGCNN-MF-ANN model. The CGCNN-MF-ANN model has achieved a very good prediction of the photocatalytic degradation rate constants by different photocatalysts over a wide range of organic contaminants. The effects of the data training strategy on the ML model performance are compared. The effects of different factors on photocatalytic degradation performance are further evaluated by feature importance analyses. Examples are illustrated on the use of this novel ML model for optimal photocatalyst selection and for assessing other types of photocatalysts for different environmental applications.

Highlights

  • Water pollution associated with the increasing amount of human and industrial activities has become an emerging environmental issue that threatens the health of people and animals [1]

  • The crystal graph convolution neural network (CGCNN)-molecular fingerprints (MF)-artificial neural network (ANN) machine learning (ML) model with optimal hyperparameters was trained with a three-fold cross-validation method

  • The complete dataset was randomly split into three subgroups, with any of the two subgroups used for model training and the rest used for testing

Read more

Summary

Introduction

Water pollution associated with the increasing amount of human and industrial activities has become an emerging environmental issue that threatens the health of people and animals [1]. Metal-oxide semiconductor photocatalysts are capable of degrading organic compounds in contaminated water Methods to assess their performance via conventional experimental approach incur tremendous efforts and investments, in light of the complex structure of photocatalysts and the wide range of contaminants. The recent progress in machine learning (ML) allows a data-driven approach that leads to much more efficient investigation and prediction of the performance features of different photocatalysts. ML model allows to fully utilize experimental data in published literature and can generate results that guide subsequent experimental designs These significantly save time and labor compared with the conventional experimental approach. The CGCNN-MF-ANN model achieved satisfactory consistent performance by learning from the connections between experimental variables (the types of photocatalysts, contaminants, experimental conditions) and the photocatalytic activities. It allowed to predict the performance of new photocatalysts as well as to select the best photocatalyst for degradation of a range of contaminants

Results of ML Model Prediction
Model Interpretability via Feature Importance
Performance of CGCNN-MF-ANN ML Model for Different Types of Contaminants
Machine Learning Model Structure and Optimization

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.