Machine Learning Based Prediction of Enzymatic Degradation of Plastics Using Encoded Protein Sequence and Effective Feature Representation

Renjing Jiang,Lanyu Shang,Dong Wang,Ruohan Wang,Na Wei

doi:10.1021/acs.estlett.3c00293

Abstract

Enzyme biocatalysis for plastic treatment and recycling is an emerging field of growing interest. However, it is challenging and time-consuming to identify plastic-degrading enzymes with desirable functionality, given the large number of putative enzyme sequences. There is a critical need to develop an effective approach to accurately predict the enzyme activity in degrading different types of plastics. In this study, we developed a machine-learning-based plastic enzymatic degradation (PED) framework to predict the ability of an enzyme to degrade plastics of interest by exploring and recognizing hidden patterns in protein sequences. A data set integrating information from a wide range of experimentally verified enzymes and various common plastic substrates was created. A new context-aware enzyme sequence representation (CESR) mechanism was developed to learn the abundant contextual information in enzyme sequences, and feature extraction was performed for enzymes at both the amino acid level and global sequence level. Thirteen machine learning classification algorithms were compared, and XGBoost was identified as the best-performing algorithm. PED achieved an overall accuracy of 90.2% and outperformed sequence-based protein classification models from the existing literature. Furthermore, important enzyme features in plastic degradation were identified and comprehensively interpreted. This study demonstrated a new tool for the prediction and discovery of plastic-degrading enzymes.

Full Text