Abstract

Proteins are the major carriers of biological processes and extant proteome contains tremendous diversity. However, the theoretical diversity of proteins greatly outnumbered the currently known, largely due to evolutionary constraints. Here, we propose that untouched protein space, either extant yet with unknown function, or unnatural proteins could have many proteins of desired functions, and outlined a roadmap for exploring such protein space with artificial intelligence. Particularly with the methods developed in natural language processing (NLP), we can first identify a large number of functional proteins and peptides encrypted in biological big data, for instance microbiome and virome data. Secondly, larger scale mutations and directed evolution can be carried out and facilitated by NLP, to achieve improved function based on known proteins. Lastly, sampling random sequences and applying NLP might reveal the more complete landscape of protein functions and enable de novo protein design.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call