Abstract

Simple SummaryNowadays, heavy metal polluted wastewater is one of the global challenges that leads to an insufficient supply of clean water. Taking advantage of what nature has to offer, several organisms, including microalgae, can natively bioremediate these heavy metals. However, the effectiveness of such processes does not meet expectations, especially with the increasing amount of pollution in today’s world. Therefore, with the goal of creating effective strains, synthetic biology via bioengineering is widely used as a strategy to enhance the heavy metal bio-removing capability, either by directly engineering the native ability of organisms or by transferring the ability to a more suitable host. In order to do so, a list of genes or proteins involved in the processes is crucial for stepwise engineering. Yet, a large amount of information remains to be discovered. In this work, a comprehensive library of putative proteins that are involved in heavy metal bio-removal from microalgae was constructed. Moreover, with the development of machine learning, the 3D structures of these proteins are also predicted, using machine learning-based methods, to aid the use of synthetic biology further.Synthetic biology is a principle that aims to create new biological systems with particular functions or to redesign the existing ones through bioengineering. Therefore, this principle is often utilized as a tool to put the knowledge learned to practical use in actual fields. However, there is still a great deal of information remaining to be found, and this limits the possible utilization of synthetic biology, particularly on the topic that is the focus of the present work—heavy metal bio-removal. In this work, we aim to construct a comprehensive library of putative proteins that might support heavy metal bio-removal. Hypothetical proteins were discovered from Chlorella and Scenedesmus genomes and extensively annotated. The protein structures of these putative proteins were also modeled through Alphafold2. Although a portion of this workflow has previously been demonstrated to annotate hypothetical proteins from whole genome sequences, the adaptation of such steps is yet to be done for library construction purposes. We also demonstrated further downstream steps that allow a more accurate function prediction of the hypothetical proteins by subjecting the models generated to structure-based annotation. In conclusion, a total of 72 newly discovered putative proteins were annotated with ready-to-use predicted structures available for further investigation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call